rcps-buildscripts
rcps-buildscripts copied to clipboard
Install Request: Upgrade R and Bioconductor to R 4.1.1 and Bioconductor 3.13
We need to install the latest version of R and Bioconductor on the clusters and RStudio services as users are beginning to use packages that depend on R 4.1 eg IN04760228.
R: https://www.r-project.org/ Bioconductor: https://www.bioconductor.org/
Note: current most recent versions of R on Myriad, Kathleen and the Data Science Platform are:
R: 4.0.2 Bioconductor: 3.11
IN04757651 may also need R 4.1.
IN:04807854 - request for R 4.1 for INLA
I am working with the package INLA and the currently installed version is over 2 years old, which makes a huge difference as INLA has changed substantially in the last year. I am able to install the latest INLA version but this runs into problems due to the outdated R version.
Latest R is 4.1.1 (released 10th August 2021) so we will be installing this version.
Building on Kathleen.
Check prerequisites for base R
External prerequisites. Already have:
- gcc-libs/10.2.0
- compilers/gnu/10.2.0
- openblas/0.3.13-native-threads/gnu-10.2.0
Need to build:
- [x] fftw for GNU 10.2.0;
- [x] GSL for GNU 10.2.0;
- [x] HDF5 for GNU 10.2.0;
- [x] udunits for GNU 10.2.0;
- [x] NetCDF for GNU 10.2.0;
- [x] PCRE2 FOR GNU 10.2.0
Build fftw using:
cd /shared/ucl/apps/build_scripts
module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
./fftw-3.3.9_gnu-10.2.0_install 2>&1 | tee ~/Software/FFTW/fftw-3.3.9_gnu-10.2.0_install.log
Build GSL using above modules and:
./gsl-2.7-install 2>&1 | tee ~/Software/GSL/gsl-2.7_install.log
There is already a hdf5-1.10.6-gcc1020_install to do the MPI build of HDF. I'm modifying it (in a copy) to do the serial build which is needed for R.
Build HDF5 using:
./hdf5-1.10.6-gcc1020-serial_install 2>&1 | tee ~/Software/HDF5/hdf5-1.10.6-gcc1020-serial_install.log
Build UDUNITS using above modules and:
./udunits-2.2.28_install 2>&1 | tee ~/Software/netcdf/udunits-2.2.28_install.log
Build NetCDF using the above modules and:
module load hdf/5-1.10.6/gnu-10.2.0
module load udunits/2.2.28/gnu-10.2.0
./netcdf_4.8.1_install 2>&1 | tee ~/Software/netcdf/netcdf_4.8.1_install.log
Build PCRE2 using the above modules and:
./pcre2-10.37_install 2>&1 | tee ~/Software/pcre2-10.37_install.log
Build base R
Now ready to build the base R plus CRAN recommended additional packages. The build script includes the following module requirements (updated for the correct OpenBLAS module):
require beta-modules
require gcc-libs/10.2.0
require compilers/gnu/10.2.0
require openblas/0.3.13-serial/gnu-10.2.0
require java/1.8.0_92
require fftw/3.3.9/gnu-10.2.0
require ghostscript/9.19/gnu-4.9.2
require texinfo/6.6/gnu-4.9.2
require texlive/2019
require gsl/2.7/gnu-10.2.0
require hdf/5-1.10.6/gnu-10.2.0
require udunits/2.2.28/gnu-10.2.0
require netcdf/4.8.1/gnu-10.2.0
require pcre2/10.37/gnu-10.2.0
for the base build no MPI module is needed. Building using:
module -f unload compilers mpi gcc-libs
./R-4.1.1_install 2>&1 | tee ~/Software/R/R-4.1.1_install.log
Base R is now installed on Kathleen. Next step is to update the module file run some simple tests and then add the additional packages.
Module file added and basic tests run using the following module files:
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
module load openblas/0.3.13-native-threads/gnu-10.2.0
module load java/1.8.0_92
module load fftw/3.3.9/gnu-10.2.0
module load ghostscript/9.19/gnu-4.9.2
module load texinfo/6.6/gnu-4.9.2
module load texlive/2019
module load gsl/2.7/gnu-10.2.0
module load hdf/5-1.10.6/gnu-10.2.0
module load udunits/2.2.28/gnu-10.2.0
module load netcdf/4.8.1/gnu-10.2.0
module load pcre2/10.37/gnu-10.2.0
module load r/4.1.1-openblas/gnu-10.2.0
Check and Update Prerequisites for Additional R Packages and Bioconductor
Next step is to install the additional R packages using the R-4.1.1_packages_install script in build_scripts. Before this can be run there are a number of external perquisites that need to be checked and installed/updated if necessary. So as well as the above modules the following are required:
require perl/5.22.0
require libtool/2.4.6
require freetype/2.8.1/gnu-4.9.2
require graphicsmagick/1.3.21
require python/2.7.12
require sqlite/3.31.1/gnu-9.2.0
require proj.4/7.0.0/gnu-9.2.0
require gdal/3.0.4/gnu-9.2.0
require geos/3.8.1/gnu-9.2.0
require gmt/6.0.0/gnu-9.2.0
require v8/3.15
require protobuf/3.14.0/gnu-9.2.0
require jq/1.5/gnu-4.9.2
require plink/1.90b3.40
These are the versions from the R 4.0.2 build so the following will definitely need to be updated:
- [x] sqlite;
- [x] proj.4;
- [x] gdal;
- [x] geos;
- [x] gmt;
- [x] protobuf;
Working on these next. Don't need to update V8 at this time.
Updating SQLite to 3.36.0 as per RSqlite package version. Build using:
module -f unload compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
./SQLite-3.36.0_install 2>&1 | tee ~/Software/R/SQLite-3.36.0_install.log
For the Geographic packages: proj.4, gdal, geos and gmt it is normally best to build latest stable versions as they all interact with each other. Starting with:
- PROJ.4 which is at version 8.1.1- https://proj.org/index.html
Build PROJ.4 using above modules and:
module load sqlite/3.36.0/gnu-10.2.0
./PROJ.4-8.1.1_install 2>&1 | tee ~/Software/PROJ.4/PROJ.4-8.1.1_install.log
Build has completed. Checking and then updating module file.
- GDAL 3.3.2 - https://gdal.org/
Build using the above modules and:
module load proj.4/8.1.1/gnu-10.2.0
./gdal-3.3.2_install 2>&1 | tee ~/Software/GDAL/gdal-3.3.2_install.log
I'm getting a warning during the build:
configure: WARNING: Can not find SQLITE_VERSION macro in sqlite3.h header to retrieve SQLite version!
Investigating ...
Trying out a fix to the build script ...
The fix seems to have worked. I had an extra directory at the end of the SQLITE3 environment variable in the build script. I must have copied and pasted too much! Its now set correctly to:
/shared/ucl/apps/SQLite/3360000
Updating module file next.
Note: build takes about 1 hour on Kathleen.
- GEOS 3.9.1 - https://trac.osgeo.org/geos
Build using the above modules and:
./geos-3.9.1_install 2>&1 | tee ~/Software/GEOS/geos-3.9.1_install.log
It had the wrong gcc-libs in the prepare_module function so I've updated the build script and will run the build again.
- GMT 6.2.0 - https://www.generic-mapping-tools.org/
There are a couple of minor issues - GDAL now depends on Python 3 so:
- Cannot use a Python bundle as they depend on old OpenBlas (GNU 4.9.2 version);
- Need to build GMT with Python3 and not Python2 if possible.
Try using:
require python/3.9.6
and updating the GDAL module acordingly.
Rebuilding GDAL with the updated Python3 dependency.
This didn't work:
WARNING: numpy not available! Array support will not be enabled
Traceback (most recent call last):
File "setup.py", line 350, in <module>
readme = open('README.rst', encoding="utf-8").read()
TypeError: 'encoding' is an invalid keyword argument for this function
make[2]: *** [build] Error 1
make[2]: Leaving directory `/dev/shm/3.3.2/tmp.zo0b5geH1j/gdal-3.3.2/swig/python'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/dev/shm/3.3.2/tmp.zo0b5geH1j/gdal-3.3.2/swig'
make: *** [swig-modules] Error 2
Looks like a full Python3 bundle for GNU 10.2.0 is needed.