rcps-buildscripts icon indicating copy to clipboard operation
rcps-buildscripts copied to clipboard

Install Request: Upgrade R and Bioconductor to R 4.1.1 and Bioconductor 3.13

Open balston opened this issue 3 years ago • 66 comments

We need to install the latest version of R and Bioconductor on the clusters and RStudio services as users are beginning to use packages that depend on R 4.1 eg IN04760228.

R: https://www.r-project.org/ Bioconductor: https://www.bioconductor.org/

balston avatar Jul 01 '21 11:07 balston

Note: current most recent versions of R on Myriad, Kathleen and the Data Science Platform are:

R: 4.0.2 Bioconductor: 3.11

balston avatar Jul 01 '21 11:07 balston

IN04757651 may also need R 4.1.

balston avatar Jul 02 '21 11:07 balston

IN:04807854 - request for R 4.1 for INLA

I am working with the package INLA and the currently installed version is over 2 years old, which makes a huge difference as INLA has changed substantially in the last year. I am able to install the latest INLA version but this runs into problems due to the outdated R version.

heatherkellyucl avatar Aug 06 '21 09:08 heatherkellyucl

Latest R is 4.1.1 (released 10th August 2021) so we will be installing this version.

Building on Kathleen.

balston avatar Sep 01 '21 13:09 balston

Check prerequisites for base R

External prerequisites. Already have:

  • gcc-libs/10.2.0
  • compilers/gnu/10.2.0
  • openblas/0.3.13-native-threads/gnu-10.2.0

Need to build:

  • [x] fftw for GNU 10.2.0;
  • [x] GSL for GNU 10.2.0;
  • [x] HDF5 for GNU 10.2.0;
  • [x] udunits for GNU 10.2.0;
  • [x] NetCDF for GNU 10.2.0;
  • [x] PCRE2 FOR GNU 10.2.0

balston avatar Sep 01 '21 14:09 balston

Build fftw using:

cd /shared/ucl/apps/build_scripts
module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
./fftw-3.3.9_gnu-10.2.0_install 2>&1 | tee ~/Software/FFTW/fftw-3.3.9_gnu-10.2.0_install.log

balston avatar Sep 01 '21 15:09 balston

Build GSL using above modules and:

./gsl-2.7-install 2>&1 | tee ~/Software/GSL/gsl-2.7_install.log

balston avatar Sep 01 '21 16:09 balston

There is already a hdf5-1.10.6-gcc1020_install to do the MPI build of HDF. I'm modifying it (in a copy) to do the serial build which is needed for R.

balston avatar Sep 02 '21 12:09 balston

Build HDF5 using:

./hdf5-1.10.6-gcc1020-serial_install 2>&1 | tee ~/Software/HDF5/hdf5-1.10.6-gcc1020-serial_install.log

balston avatar Sep 02 '21 15:09 balston

Build UDUNITS using above modules and:

./udunits-2.2.28_install 2>&1 | tee ~/Software/netcdf/udunits-2.2.28_install.log

balston avatar Sep 02 '21 16:09 balston

Build NetCDF using the above modules and:

module load hdf/5-1.10.6/gnu-10.2.0
module load udunits/2.2.28/gnu-10.2.0
./netcdf_4.8.1_install 2>&1 | tee ~/Software/netcdf/netcdf_4.8.1_install.log

balston avatar Sep 02 '21 16:09 balston

Build PCRE2 using the above modules and:

./pcre2-10.37_install 2>&1 | tee ~/Software/pcre2-10.37_install.log

balston avatar Sep 03 '21 13:09 balston

Build base R

Now ready to build the base R plus CRAN recommended additional packages. The build script includes the following module requirements (updated for the correct OpenBLAS module):

require beta-modules
require gcc-libs/10.2.0
require compilers/gnu/10.2.0
require openblas/0.3.13-serial/gnu-10.2.0
require java/1.8.0_92
require fftw/3.3.9/gnu-10.2.0
require ghostscript/9.19/gnu-4.9.2
require texinfo/6.6/gnu-4.9.2
require texlive/2019
require gsl/2.7/gnu-10.2.0
require hdf/5-1.10.6/gnu-10.2.0
require udunits/2.2.28/gnu-10.2.0
require netcdf/4.8.1/gnu-10.2.0
require pcre2/10.37/gnu-10.2.0

for the base build no MPI module is needed. Building using:

module -f unload compilers mpi gcc-libs
./R-4.1.1_install 2>&1 | tee ~/Software/R/R-4.1.1_install.log

balston avatar Sep 03 '21 15:09 balston

Base R is now installed on Kathleen. Next step is to update the module file run some simple tests and then add the additional packages.

balston avatar Sep 03 '21 17:09 balston

Module file added and basic tests run using the following module files:

module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load openblas/0.3.13-native-threads/gnu-10.2.0
module load java/1.8.0_92
module load fftw/3.3.9/gnu-10.2.0
module load ghostscript/9.19/gnu-4.9.2
module load texinfo/6.6/gnu-4.9.2
module load texlive/2019
module load gsl/2.7/gnu-10.2.0
module load hdf/5-1.10.6/gnu-10.2.0
module load udunits/2.2.28/gnu-10.2.0
module load netcdf/4.8.1/gnu-10.2.0
module load pcre2/10.37/gnu-10.2.0
module load r/4.1.1-openblas/gnu-10.2.0

balston avatar Sep 06 '21 11:09 balston

Check and Update Prerequisites for Additional R Packages and Bioconductor

Next step is to install the additional R packages using the R-4.1.1_packages_install script in build_scripts. Before this can be run there are a number of external perquisites that need to be checked and installed/updated if necessary. So as well as the above modules the following are required:

require perl/5.22.0
require libtool/2.4.6
require freetype/2.8.1/gnu-4.9.2
require graphicsmagick/1.3.21
require python/2.7.12
require sqlite/3.31.1/gnu-9.2.0
require proj.4/7.0.0/gnu-9.2.0
require gdal/3.0.4/gnu-9.2.0
require geos/3.8.1/gnu-9.2.0
require gmt/6.0.0/gnu-9.2.0
require v8/3.15
require protobuf/3.14.0/gnu-9.2.0
require jq/1.5/gnu-4.9.2
require plink/1.90b3.40

These are the versions from the R 4.0.2 build so the following will definitely need to be updated:

  • [x] sqlite;
  • [x] proj.4;
  • [x] gdal;
  • [x] geos;
  • [x] gmt;
  • [x] protobuf;

Working on these next. Don't need to update V8 at this time.

balston avatar Sep 06 '21 13:09 balston

Updating SQLite to 3.36.0 as per RSqlite package version. Build using:

module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
./SQLite-3.36.0_install 2>&1 | tee ~/Software/R/SQLite-3.36.0_install.log

balston avatar Sep 06 '21 14:09 balston

For the Geographic packages: proj.4, gdal, geos and gmt it is normally best to build latest stable versions as they all interact with each other. Starting with:

  • PROJ.4 which is at version 8.1.1- https://proj.org/index.html

balston avatar Sep 06 '21 14:09 balston

Build PROJ.4 using above modules and:

module load sqlite/3.36.0/gnu-10.2.0
./PROJ.4-8.1.1_install 2>&1 | tee ~/Software/PROJ.4/PROJ.4-8.1.1_install.log

balston avatar Sep 06 '21 15:09 balston

Build has completed. Checking and then updating module file.

balston avatar Sep 06 '21 16:09 balston

  • GDAL 3.3.2 - https://gdal.org/

Build using the above modules and:

module load proj.4/8.1.1/gnu-10.2.0
./gdal-3.3.2_install 2>&1 | tee ~/Software/GDAL/gdal-3.3.2_install.log

balston avatar Sep 07 '21 09:09 balston

I'm getting a warning during the build:

configure: WARNING: Can not find SQLITE_VERSION macro in sqlite3.h header to retrieve SQLite version!

Investigating ...

balston avatar Sep 07 '21 11:09 balston

Trying out a fix to the build script ...

balston avatar Sep 07 '21 11:09 balston

The fix seems to have worked. I had an extra directory at the end of the SQLITE3 environment variable in the build script. I must have copied and pasted too much! Its now set correctly to:

/shared/ucl/apps/SQLite/3360000

Updating module file next.

Note: build takes about 1 hour on Kathleen.

balston avatar Sep 07 '21 12:09 balston

  • GEOS 3.9.1 - https://trac.osgeo.org/geos

Build using the above modules and:

./geos-3.9.1_install 2>&1 | tee ~/Software/GEOS/geos-3.9.1_install.log

balston avatar Sep 07 '21 13:09 balston

It had the wrong gcc-libs in the prepare_module function so I've updated the build script and will run the build again.

balston avatar Sep 07 '21 14:09 balston

  • GMT 6.2.0 - https://www.generic-mapping-tools.org/

balston avatar Sep 07 '21 15:09 balston

There are a couple of minor issues - GDAL now depends on Python 3 so:

  • Cannot use a Python bundle as they depend on old OpenBlas (GNU 4.9.2 version);
  • Need to build GMT with Python3 and not Python2 if possible.

Try using:

require python/3.9.6

and updating the GDAL module acordingly.

balston avatar Sep 08 '21 11:09 balston

Rebuilding GDAL with the updated Python3 dependency.

balston avatar Sep 08 '21 11:09 balston

This didn't work:

WARNING: numpy not available!  Array support will not be enabled
Traceback (most recent call last):
  File "setup.py", line 350, in <module>
    readme = open('README.rst', encoding="utf-8").read()
TypeError: 'encoding' is an invalid keyword argument for this function
make[2]: *** [build] Error 1
make[2]: Leaving directory `/dev/shm/3.3.2/tmp.zo0b5geH1j/gdal-3.3.2/swig/python'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/dev/shm/3.3.2/tmp.zo0b5geH1j/gdal-3.3.2/swig'
make: *** [swig-modules] Error 2

Looks like a full Python3 bundle for GNU 10.2.0 is needed.

balston avatar Sep 08 '21 12:09 balston