Skip to content

Why Building R from Source on Linux Offers Unrivaled Performance

In this exciting blog post, we will embark on a journey to build the incredible R software for statistical computing and graphics from its very source. Trust me, I've got some fascinating reasons to convince you why building from scratch trumps the traditional method of installing R software from your Linux distro's package manager.

I might have already scared away a few readers or discouraged those seeking a quick fix. But for those brave souls who choose to stay, I assure you, this is going to be an absolute delight and a truly worthwhile investment.

Prerequisites

Before we dive into the thrilling adventure of building the remarkable R software from scratch, let's make sure you have everything you need. Here are a couple of prerequisites that will set you up for success:

Firstly, you'll need a Linux system to work with. Whether it's a physical machine, a virtual machine, or even a cloud-based setup, as long as you have Linux installed, you're good to go.

Secondly, ensure that you have administrative access and the necessary permissions to build and install software. This will enable you to unleash the full potential of R and its extraordinary capabilities.

Now that we have all the essentials covered, let's embark on this exhilarating journey together!

Why Build from Scratch?

There are numerous compelling reasons to opt for building a software package from its source code, particularly when dealing with computationally intensive programs. The primary advantage lies in performance optimization. By generating the binaries on the same machine where you'll be conducting the computational analysis, you can tailor the compilation specifically for your hardware and available dependencies. However, this doesn't mean you have to build everything from scratch, including the dependencies. While it is possible, I don't anticipate a significant performance boost from rebuilding all the dependencies from source.

Furthermore, there are other benefits to consider. For instance, building from source allows you to utilize the latest stable version that may not be available through your Linux distribution's package manager. In some cases, certain distributions may not even have a package manager at all.

Additionally, building from scratch offers the flexibility to customize your R software by adding or removing specific capabilities based on your unique requirements. Ultimately, building from source provides greater flexibility and improves performance, making it a worthwhile endeavor.

Finally, don't let the idea of building from scratch intimidate you. While it may require some time and research, I assure you that we will navigate this journey together. So take a moment to sit back, relax, and perhaps indulge in your favorite cup of coffee or tea as we dive into this exciting endeavor.

Installing the Dependencies

Before we can dive into the exciting process of building the incredible R software from scratch, we need to make sure we have all the necessary dependencies and helpful tools. While I'll strive to remain distro agnostic, I'll provide examples using Dandified Yum or "dnf" for Fedora/RHEL-based systems, and the Advanced Package Tool or "apt" for Debian-based systems. Don't worry if you're using a Fedora/RHEL-based system that uses "yum" – you can easily follow along by replacing "dnf" with "yum".

It's worth noting that there may be slight variations in the naming convention of the software libraries for different package managers. While I'll cover a broad range of possibilities, you might need to do a bit of research if your package manager can't find a specific library we need.

To begin, we'll install some essential utilities that will help us retrieve files from the web, extract compressed files, and provide the necessary compilers for building R. We'll update our system while we are at it too. Depending on your Linux system configuration, some of these may already be installed.

On Fedora/RHEL-based systems (dnf or yum)

sudo dnf update -y
sudo dnf install wget tar git
sudo dnf install gcc gfortran g++ glibc-locale-source cmake

On Debian-based systems (apt)

sudo apt update
sudo apt install wget tar git
sudo apt install build-essential
sudo apt install gfortran cmake

Next we will install necessary dependencies for our R build:

On Fedora/RHEL-based systems (dnf or yum)

sudo dnf install ncurses ncurses-devel readline readline-devel openssl-devel libcurl-devel cairo-devel libXt-devel libtiff-devel libjpeg-turbo-devel java-11-openjdk-devel

On Debian-based systems (apt)

sudo apt install libncurses-dev libxml2-dev libreadline-dev libssl-dev libcurl4-openssl-dev libzip-dev libbz2-dev libcairo-dev libxt-dev libtiff-dev libjpeg-dev

Install the required OpenJDK libraries (See note below about Debian-based systems and OpenJDK 11 versus OpenJDK 17)

sudo apt install openjdk-11-source

If installing OpenJDK 17, use the following command:

sudo apt install openjdk-17-source

Note: For users of Debian-based systems, like Debian 12, obtaining OpenJDK 11 is no longer as straightforward as before. The default OpenJDK on Debian 12 is OpenJDK 17 and prior versions of OpenJDK have been removed. You can still build R even if you only have access to OpenJDK 17. I have tested building R with OpenJDK 17 on Debian 12 with success. However, if you would prefer to use OpenJDK 11, there are various ways to overcome this hurdle, depending on the Debian-based system you're using. The most definitive solution is to install OpenJDK 11 from source. Another option is to include previous repositories or developmental repositories that provide OpenJDK 11 in your package manager's configuration. Although I won't delve into the intricacies of installing OpenJDK from source (which could be a topic for a separate blog post), I'll give you a brief overview of how to add the Debian unstable repository source to your package manager's source list. To accomplish this, simply open the source list using your preferred text editor (in this example, I'll use vim):

vi /etc/apt/sources.list

Look for the line that has "deb http://deb.debian.org/debian/" and add "sid" like shown below:

deb http://deb.debian.org/debian/ sid bookworm main non-free-firmware

You will now be able to install OpenJDK 11 by running:

sudo apt install openjdk-11-source

There you have it, my friends! Piece of cake, right? However, during this process, you might encounter a few bumps in the road, especially if you're using a less common Linux distribution. Don't fret! If you can't find a particular package, simply perform a quick Google search with your distribution name and the package you're missing. For example, if you're using Debian 12 and are struggling to install gfortran, just search "Debian 12 install gfortran". You'll likely stumble upon a helpful blog, a Stack Exchange question, or a forum thread that will guide you through the installation process. And if all else fails, don't hesitate to reach out to your distribution's mailing list or forum for assistance. Oh, and did I mention that we also offer a comprehensive package for R statistical analysis? So, if you need any further support, feel free to get in touch with us too! Alright, let's now move on to the exciting part: building R!

Building R Statistical Software

To start the process of building R, you'll need to download the source code. You can find it on the official R website. Choose the version you want to build or opt for the latest stable version. For those feeling adventurous, you can even go for the latest development version.

In the example below, I'll be downloading the latest stable version (4.3.2 at the time of writing) from the mirror provided by the National Institute of Computer Science at the University of Tennessee-Knoxville, which happens to be the nearest mirror to my location.

wget https://mirrors.nics.utk.edu/cran/src/base/R-4/R-4.3.2.tar.gz

After the download is finished, it's time to navigate to the directory and configure to make process. We can do this by running the following command:

cd R-4.3.2/
./configure

In the Linux world, running ./configure is a bit like that preparation phase. It's a tool that helps get everything ready to build a software program from its source code. When you download software, it's like receiving a box of parts and instructions. The ./configure script checks your system, making sure you have the necessary tools and pieces (like the software libraries we installed earlier) to put the program together.

It examines your computer and figures out how to adapt the software to work with your specific setup. This might involve checking what other software you have installed, what capabilities your computer has, and what features you might want to include or exclude from the final program.

Once ./configure finishes its job, it creates a blueprint—a file called Makefile. This Makefile contains all the instructions needed to actually build the software using the make command. So, think of ./configure as the preparation step that helps tailor the software to your system before you start assembling it with make.

After completing the configuration process, a comprehensive summary will be displayed, indicating whether the configuration was successful or if any errors were encountered. If you happen to encounter errors during the configuration, it is likely due to missing dependencies that are required. I recommend searching for the specific missing dependency for your Linux distribution to find instructions on how to install it. Sometimes, the dependency may already be installed, but the PATH is not defined. Once you have attempted to resolve the issue and are ready to try again, simply run the commands "make clean" and "./configure". Following a successful configuration, you can expect to see a similar output (specifics may vary depending on your system's configuration and dependencies).

R is now configured for x86_64-pc-linux-gnu

  Source directory:            .
  Installation directory:      /usr/local

  C compiler:                  gcc  -g -O2
  Fortran fixed-form compiler: gfortran  -g -O2

  Default C++ compiler:        g++ -std=gnu++17  -g -O2
  C++11 compiler:              g++ -std=gnu++11  -g -O2
  C++14 compiler:              g++ -std=gnu++14  -g -O2
  C++17 compiler:              g++ -std=gnu++17  -g -O2
  C++20 compiler:              g++ -std=gnu++20  -g -O2
  C++23 compiler:              g++ -std=gnu++23  -g -O2
  Fortran free-form compiler:  gfortran  -g -O2
  Obj-C compiler:            

  Interfaces supported:        X11
  External libraries:          pcre2, readline, curl
  Additional capabilities:     PNG, JPEG, TIFF, NLS, cairo, ICU
  Options enabled:             shared BLAS, R profiling

  Capabilities skipped:        
  Options not enabled:         memory profiling

  Recommended packages:        yes

Now it's time to build R! Building R may take some time, so make yourself comfortable. To begin building R, execute the following command:

make

As the build process unfolds before your eyes, your screen fills with a mesmerizing flurry of output, almost like you've been transported into the digital world of the Matrix. Amongst the sea of information, you might catch sight of some worrisome-looking warnings, but fear not, for these can be safely disregarded. What truly matters are the absence of errors or any signs of an "exited with a non-zero status" outcome. These would indicate a glitch in the build process that the configuration script failed to detect. However, if the stars align, and the build is successful, an exhilarating message shall grace your screen, akin to the following:

make[1]: Leaving directory '/home/kenneth/R-4.3.2'
Unfortunately, the build process does not provide a clear indication of success, such as a "Build complete!" message. However, keep an eye out for a generic leaving directory statement and make sure there are no errors. Once the build finishes successfully, we can proceed to install R. Depending on your system, you may need to run the installation command with elevated privileges using sudo.
sudo make install

There you have it! You have built your own instance of R which you can now enjoy as follows:

R

Keep in mind, when installing R packages, you have the option of installing them in the user-space or globally. If you want to install an R package globally, you will need to execute R with elevated privileges. Some systems will have the path for R already configured for elevated users, but that may not always be the case. If you receive a "command not found" error, simply run:

whereis R

Hopefully, you will get a response displaying the absolute path to R. Here's an example on my Oracle Linux 9:

R: /usr/local/bin/R

You can use the absolute path along with sudo to run R with elevated privileges:

sudo /usr/local/bin/R

Summary

As we come to the end of our journey, I hope that your experience in custom building R for your workstation has been nothing short of smooth and successful. The version of R provided by a package manager may lag behind due to various factors such as Linux distribution release policies. This can sometimes result in version-specific dependency issues when installing certain R packages or Bioconductor packages. By taking the time to build R from scratch, you not only gain a deeper understanding of the software but also have the flexibility to tailor it to your specific needs. Additionally, you have the opportunity to create a custom build that is perfectly optimized for the unique hardware of your device.

This blog post is the first in a multi-series blog focused on R and bioinformatics. I encourage you to stay tuned for more exciting and informative content. Whether you're a beginner or an experienced user, there will be something for everyone in the upcoming posts.

If you ever find yourself in need of assistance with data analysis using R, don't hesitate to reach out to us. We are here to help and support you in your journey. Our team of experts is well-versed in R and can provide guidance and solutions to any challenges you may encounter.

Thank you for joining us on this journey, and we look forward to continuing to empower you with knowledge and resources for your data analysis endeavors.