bioconductor_docker icon indicating copy to clipboard operation
bioconductor_docker copied to clipboard

Move dependencies out of Dockerfile

Open jwokaty opened this issue 3 years ago • 9 comments

This PR creates Ubuntu-files to track docker-specific dependencies and skip dependencies that we don't want installed from BBS.

Regarding libmariadb-dev-compat, I've put it in apt_required.txt but commented it out because it depends on libmariadb-dev and conflicts with libmysqlclient-dev, which gets installed on the build system. We can choose to skip libmysqlclient-dev by putting it in apt_skip.txt and uncomment libmariadb-dev-compat in apt_required.txt so that it gets installed.

bin/install.sh installs BBS dependencies, comparing the BBS Ubuntu-files to this repo's Ubuntu-files.

When I tested, I was able to install the following packages:

 [1] "a4"             "a4Base"         "bioCancer"      "BioMM"         
 [5] "BLMA"           "bnbc"           "canceR"         "ChemmineOB"    
 [9] "cicero"         "CoGAPS"         "ctgGEM"         "CytoTree"      
[13] "edge"           "GeneTonic"      "gpuMagic"       "igvR"          
[17] "methylscaper"   "monocle"        "phemd"          "podkat"        
[21] "projectR"       "RCyjs"          "spatialHeatmap" "tenXplore"     
[25] "tradeSeq"       "Travel"         "uSORT"          "webbioc"    

On 6/29, docker images reported the size as 4.57GB.

I'd appreciate any feedback to improve this PR. You can see my PR for BBS at https://github.com/Bioconductor/BBS/pull/84.

jwokaty avatar Jun 14 '21 17:06 jwokaty

Thanks for the PR @jwokaty.

The overall size of the docker image currently is much larger by about 750MB (approx)

bioconductor/bioconductor_docker                              jw-update              337ddea798d9   45 hours ago   4.65GB
bioconductor/bioconductor_docker                              devel                  1458fe590fe7   46 hours ago   3.93GB

The key questions for this image are:

  1. Does this PR make the bioconductor/bioconductor_docker:devel image the "same" as the BBS linux machine? (was that the goal here? if so, we could make a new image bioconductor_docker:linux_builder --> We still need to test if it's the same as the BBS machine though.)

  2. What are the "extra" 750MB worth of system dependencies?

  3. One thing for me that makes this PR a little complicated to read is that the packages that are being installed aren't "explicit" anymore. They are lost in the apt-*.txt files and then, within the awk commands.

I'm happy to help on any of these, and welcome thoughts from @jwokaty, @hpages, @vjcitn and @mtmorgan .

nturaga avatar Jul 01 '21 13:07 nturaga

I took a look at Dockerfile, went thru the list of deb packages that are explicitly listed in the file, and annotated them. This should help us decide what to do with each of them. The goal is that each deb package should go in one of the following lists:

  1. apt_required_build.txt
  2. apt_required_compile_R.txt
  3. apt_optional_compile_R.txt
  4. apt_extra_fonts.txt
  5. apt_cran.txt
  6. apt_bioc.txt
  7. apt_nice_to_have.txt
  8. apt_docker_only.txt

All these lists (except the last one) are in https://github.com/Bioconductor/BBS/tree/master/Ubuntu-files/20.04/. The last one (apt_docker_only.txt) would need to be created. It would list stuff that is maybe nice to have on the Docker image for developers but ~are~ is not strictly required to install/run Bioconductor. Your input will be valuable @nturaga to decide whether or not you want to keep these things on the Docker image.

Here's the annotated list extracted from Dockerfile:

	## Basic deps
	gdb \                          add to apt_nice_to_have.txt
	libxml2-dev \                  already in apt_cran.txt
	python3-pip \                  already in apt_required_build.txt
	libz-dev \                     who needs that? maybe create a new list
                                       (e.g. apt_docker_only.txt) and add to it
	liblzma-dev \                  already in apt_required_compile_R.txt
	libbz2-dev \                   already in apt_required_compile_R.txt
	libpng-dev \                   already in apt_optional_compile_R.txt
	libgit2-dev \                  already in apt_cran.txt
	## sys deps from bioc_full
	pkg-config \                   add to apt_nice_to_have.txt
	fortran77-compiler \           we use gfortran (in apt_required_compile_R.txt) on the build
                                       machines
	byacc \                        who needs that? maybe add to apt_docker_only.txt
	automake \                     already in apt_bioc.txt
	curl \                         we use libcurl4-openssl-dev on the build
                                       machines (needed for CRAN packages RCurl
                                       and curl)
	## This section installs libraries
	libpcre2-dev \                 already in apt_required_compile_R.txt
	libnetcdf-dev \                already in apt_bioc.txt
	libhdf5-serial-dev \           who needs that? maybe add to apt_docker_only.txt
	libfftw3-dev \                 already in apt_cran.txt
	libopenbabel-dev \             already in apt_bioc.txt
	libopenmpi-dev \               we use mpi-default-dev (apt_cran.txt) on the build machines
	libxt-dev \                    already in apt_required_compile_R.txt
	libudunits2-dev \              already in apt_cran.txt
	libgeos-dev \                  already in apt_cran.txt
	libproj-dev \                  already in apt_cran.txt
	libcairo2-dev \                already in apt_optional_compile_R.txt
	libtiff5-dev \                 we use libtiff-dev (in apt_optional_compile_R.txt) on the
                                       build machines
	libreadline-dev \              already in apt_required_compile_R.txt
	libgsl0-dev \                  we use libgsl-dev (in apt_bioc.txt) on the build machines
	libgslcblas0 \                 who needs that? maybe add to apt_docker_only.txt
	libgtk2.0-dev \                already in apt_cran.txt
	libgl1-mesa-dev \              gets automatically installed by libglu1-mesa-dev so maybe no
                                       need for an explicit install
	libglu1-mesa-dev \             already in apt_cran.txt
	libgmp3-dev \                  we use libgmp-dev (in apt_cran.txt) on the build machines
	libhdf5-dev \                  who needs that? maybe add to apt_docker_only.txt
	libncurses-dev \               gets automatically installed by libreadline-dev but maybe
                                       add it to apt_required_compile_R.txt anyway just in case
	libbz2-dev \                   already in apt_required_compile_R.txt
	libxpm-dev \                   who needs that? maybe add to apt_docker_only.txt
	liblapack-dev \                we don't use the system LAPACK library on the build machines
	libv8-dev \                    already in apt_cran.txt
	libgtkmm-2.4-dev \             already in apt_bioc.txt
	libmpfr-dev \                  already in apt_cran.txt
	libmodule-build-perl \         who needs that? maybe add to apt_docker_only.txt
	libapparmor-dev \              who needs that? maybe add to apt_docker_only.txt
	libprotoc-dev \                who needs that? maybe add to apt_docker_only.txt
	librdf0-dev \                  who needs that? maybe add to apt_docker_only.txt
	libmagick++-dev \              already in apt_cran.txt
	libsasl2-dev \                 already in apt_cran.txt
	libpoppler-cpp-dev \           already in apt_cran.txt
	libprotobuf-dev \              already in apt_cran.txt
	libpq-dev \                    already in apt_cran.txt
	libperl-dev \                  already in apt_cran.txt
	## software - perl extentions and modules
	libarchive-extract-perl \      who needs that? maybe add to apt_docker_only.txt
	libfile-copy-recursive-perl \  who needs that? maybe add to apt_docker_only.txt
	libcgi-pm-perl \               who needs that? maybe add to apt_docker_only.txt
	libdbi-perl \                  who needs that? maybe add to apt_docker_only.txt
	libdbd-mysql-perl \            who needs that? maybe add to apt_docker_only.txt
	libxml-simple-perl \           who needs that? maybe add to apt_docker_only.txt
	libmysqlclient-dev \           already in apt_cran.txt
	default-libmysqlclient-dev \   not needed (redundant with libmysqlclient-dev)
	libgdal-dev \                  already in apt_cran.txt
	## new libs
        libglpk-dev \                  already in apt_cran.txt
        libeigen3-dev \                already in apt_bioc.txt
	## Databases and other software
	sqlite \                       not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	openmpi-bin \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	mpi-default-bin \              only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	openmpi-common \               only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	openmpi-doc \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	tcl8.6-dev \                   we use tcl-dev (apt_optional_compile_R.txt) on the build machines
	tk-dev \                       already in apt_optional_compile_R.txt
	default-jdk \                  already in apt_optional_compile_R.txt
	imagemagick \                  not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	tabix \                        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	ggobi \                        who needs that? maybe add to apt_docker_only.txt
	graphviz \                     already in apt_bioc.txt
	protobuf-compiler \            already in apt_cran.txt
	jags \                         already in apt_cran.txt
	## Additional resources
	xfonts-100dpi \                already in apt_extra_fonts.txt
	xfonts-75dpi \                 already in apt_extra_fonts.txt
	biber \                        AFAIK this is only needed to build some vignettes so we have
                                       it listed in apt_vignettes_reference_manuals.txt and that's
                                       a list that we do not want to install on the Docker image
        libsbml5-dev \                 already in apt_bioc.txt
        libzmq3-dev \                  who needs that? maybe add to apt_docker_only.txt

## FIXME
## These two libraries don't install in the above section--WHY?
RUN apt-get update \
	&& apt-get -y --no-install-recommends install \
	libmariadb-dev-compat \        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt if you really want this on the Docker
                                       image
	libjpeg-dev \                  already in apt_optional_compile_R.txt
	libjpeg-turbo8-dev \           installing libjpeg-dev should be enough
	libjpeg8-dev \                 installing libjpeg-dev should be enough

Note that I've left the following section from Dockerfile out of the discussion for now:

## Python installations
RUN apt-get update \
	&& apt-get install -y software-properties-common \
	&& add-apt-repository universe \
	&& apt-get update \
	&& apt-get -y --no-install-recommends install python2 python-dev \
	&& curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py \
	&& python2 get-pip.py \
	&& pip2 install wheel \
	## Install sklearn and pandas on python
	&& pip2 install sklearn \
	pandas \
	pyyaml \
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/* \
	&& rm -rf get-pip.py

because I'm not sure what to do with it or why it is needed. We do need some Python modules on the build machines but they should all be installed for Python 3, not Python 2 (we've dropped support for Python 2 last year).

The goal is that in the future we'll only need to add new deb packages to the apt_cran.txt and/or apt_bioc.txt lists as new (or existing) Bioconductor packages introduce new system requirements. This will impact what gets installed on both, the build machines and the Docker image.

Hope this helps, H.

hpages avatar Jul 01 '21 21:07 hpages

So in docker, we should be installing apt dependencies from the following files:

  • apt_bioc.txt
  • apt_cran.txt
  • apt_optional_compile_R
  • apt_required_compile_R
  • apt_extra_fonts
  • apt_required_compile_R
  • apt_required_build_R

We're not installing apt_nice_to_have and apt_vignettes_reference_manuals--is that correct?

I also want to clarify that when one of these files has packages not listed in the current Dockerfile in the master branch, that we still install everything in the file. For example, the dockerfile on the master branch lists only 2 font packages; however, apt_extra_fonts has 8 total packages. So we will still be installing more packages than the current docker on the master branch, but at least what we're installing is explicit.

Additionally, when the docker and build systems have a similar package, we should choose the build system package that's in one of the files listed, correct?

jwokaty avatar Jul 08 '21 13:07 jwokaty

Hi Jennifer,

These are all very good questions. I will try to answer the ones which I’m able to,

So in docker, we should be installing apt dependencies from the following files:

• apt_bioc.txt • apt_cran.txt • apt_optional_compile_R • apt_required_compile_R • apt_extra_fonts • apt_required_compile_R • apt_required_build_R We're not installing apt_nice_to_have and apt_vignettes_reference_manuals--is that correct?

I also want to clarify that when one of these files has packages not listed in the current Dockerfile in the master branch, that we still install everything in the file. For example, the dockerfile on the master branch lists only 2 font packages; however, apt_extra_fonts has 8 total packages. So we will still be installing more packages than the current docker on the master branch, but at least what we're installing is explicit.

Somehow my experience so far has been that these apt_extra_fonts are needed only to ‘build’ vignettes. Is there a way to narrow down, what exactly these 6 extra fonts are needed for?

One thing to remember is that we are inheriting some dependencies from our parent docker image from rocker.

So the inheritance goes like this,

   ```
   ubuntu/latest —> rocker/r-ver —> rocker/rstudio —> bioconductor/bioconductor_docker`

There are dependencies which are already pre-installed and inherited from

rocker/r-ver - ( https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_R.sh)

rocker/rstudio - (https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_pandoc.sh)

And as Martin, pointed out previously, because of the AUFS file system used as ‘layers’ in docker, each additional installation of the same software gives the impression of overwriting, but we are simply adding layers and increasing size.

Additionally, when the docker and build systems have a similar package, we should choose the build system package that's in one of the files listed, correct?

This should be correct, except changes could be potentially made once we complete testing. To elaborate on testing, we’ll be building / installing all the 2000+ bioconductor packages and their dependencies and we’ll see how that goes.

Specifically, something like this, where ‘pkg’ is a vector of Bioconductor packages.

  BiocManager::install(pkg,
                       INSTALL_opts = "--build",
                       update = FALSE,
                       quiet = TRUE,
                       force = TRUE,
                       keep_outputs = TRUE)

I’m not sure if this answers your questions, but I’m happy to get on a call and discuss the solution some more.

nturaga avatar Jul 08 '21 14:07 nturaga

@nturaga Thanks for trying to answer some of my questions as well as pointing me to the rocker scripts.

I'm not sure if there's a better way to investigate these dependencies, but I decided to use code.bioconductor.org to investigate the "who needs that" packages. I will look there too for these files, but they're probably dependencies from nonbioconductor packages.

If these are for building vignettes and we don't build vignettes with docker, why are we installing them? The fonts are just one group of files where I know there are more in the BBS than in docker. I suspect there will be others.

jwokaty avatar Jul 08 '21 15:07 jwokaty

I marked all the packages that are in both docker and the BBS:

apt_bioc.txt:graphviz                    # for Rgraphviz             # Bioconductor Docker
apt_bioc.txt:libgtkmm-2.4-dev            # for HilbertVisGUI         # Bioconductor Docker
apt_bioc.txt:libgsl-dev                  # for GSL                   # Bioconductor Docker
apt_bioc.txt:libsbml5-dev                # for rsbml                 # Bioconductor Docker
apt_bioc.txt:automake                    # for RProtoBufLib          # Bioconductor Docker
apt_bioc.txt:libnetcdf-dev               # for mzR, RNetCDF          # Bioconductor Docker
apt_bioc.txt:libopenbabel-dev            # for ChemmineOB            # Bioconductor Docker
apt_bioc.txt:libeigen3-dev               # for ChemmineOB            # Bioconductor Docker
apt_cran.txt:libglu1-mesa-dev        # for rgl                       # Bioconductor Docker
apt_cran.txt:libgmp-dev              # for gmp                       # Bioconductor Docker
apt_cran.txt:libsasl2-dev            # for mongolite                 # Bioconductor Docker
apt_cran.txt:libxml2-dev             # for XML                       # Bioconductor Docker
apt_cran.txt:libcurl4-openssl-dev    # for RCurl, curl               # Bioconductor Docker
apt_cran.txt:mpi-default-dev         # for Rmpi                      # Bioconductor Docker
apt_cran.txt:libudunits2-dev         # for units                     # Bioconductor Docker
apt_cran.txt:libv8-dev               # for V8                        # Bioconductor Docker
apt_cran.txt:libmpfr-dev             # for Rmpfr                     # Bioconductor Docker
apt_cran.txt:libfftw3-dev            # for fftw, fftwtools           # Bioconductor Docker
apt_cran.txt:libmysqlclient-dev      # for RMySQL                    # Bioconductor Docker
apt_cran.txt:libpq-dev               # for RPostgreSQL, RPostgres    # Bioconductor Docker
apt_cran.txt:libmagick++-dev         # for magick                    # Bioconductor Docker
apt_cran.txt:libgeos-dev             # for rgeos                     # Bioconductor Docker
apt_cran.txt:libproj-dev             # for proj4                     # Bioconductor Docker
apt_cran.txt:libgdal-dev             # for sf                        # Bioconductor Docker
apt_cran.txt:libpoppler-cpp-dev      # for pdftools                  # Bioconductor Docker
apt_cran.txt:libgtk2.0-dev           # for RGtk2                     # Bioconductor Docker
apt_cran.txt:libgit2-dev             # for gert                      # Bioconductor Docker
apt_cran.txt:jags                    # for rjags                     # Bioconductor Docker
apt_cran.txt:libprotobuf-dev         # for protolite                 # Bioconductor Docker 
apt_cran.txt:protobuf-compiler       # for protolite                 # Bioconductor Docker
apt_cran.txt:libglpk-dev             # for glpkAPI and to compile igraph with GLPK support   # Bioconductor Docker
apt_extra_fonts.txt:xfonts-100dpi                                    # Bioconductor Docker
apt_extra_fonts.txt:xfonts-75dpi                                     # Bioconductor Docker
apt_nice_to_have.txt:gdb                                             # Bioconductor Docker        (suggested add from above)
apt_nice_to_have.txt:pkg-config                                      # Bioconductor Docker       (suggested add from above)
apt_optional_compile_R.txt:libpng-dev                                # Bioconductor Docker
apt_optional_compile_R.txt:libjpeg-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libtiff-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libcairo2-dev                             # Bioconductor Docker
apt_optional_compile_R.txt:tcl-dev                                   # Bioconductor Docker
apt_optional_compile_R.txt:tk-dev                                    # Bioconductor Docker
apt_optional_compile_R.txt:default-jdk                               # Bioconductor Docker
apt_required_build.txt:python3-pip                                   # Bioconductor Docker
apt_required_compile_R.txt:gfortran                                  # Bioconductor Docker
apt_required_compile_R.txt:libreadline-dev                           # Bioconductor Docker
apt_required_compile_R.txt:libxt-dev                                 # Bioconductor Docker
apt_required_compile_R.txt:libbz2-dev                                # Bioconductor Docker
apt_required_compile_R.txt:liblzma-dev                               # Bioconductor Docker
apt_required_compile_R.txt:libpcre2-dev                              # Bioconductor Docker
apt_required_compile_R.txt:libcurl4-openssl-dev                      # Bioconductor Docker
apt_required_compile_R.txt:libncurses-dev                            # Bioconductor Docker         (suggested add from above)

Here's what's only in the BBS:

apt_bioc.txt:firefox                     # for packages using utils::browseURL()
apt_bioc.txt:libgraphviz-dev             # for Rgraphviz
apt_bioc.txt:clustalo                    # for LowMACA
apt_bioc.txt:ocl-icd-opencl-dev          # for gpuMagic
apt_bioc.txt:libavfilter-dev             # for av/spacialHeatmap
apt_bioc.txt:libfribidi-dev              # for EnhancedVolcano
apt_bioc.txt:infernal                    # for inferrnal
apt_bioc.txt:fuse                        # for Travel
apt_bioc.txt:libfuse-dev                 # for Travel
apt_bioc.txt:kallisto                    # for rkal
apt_bioc.txt:mono-runtime                # for rawr
apt_bioc.txt:libmono-system-data4.0-cil  # for rawr
apt_cran.txt:librsvg2-dev            # for rsvg
apt_cran.txt:libssl-dev              # for openssl, mongolite
apt_extra_fonts.txt:# APT packages for extra fonts
apt_extra_fonts.txt:gsfonts-x11
apt_extra_fonts.txt:xfonts-base
apt_extra_fonts.txt:xfonts-scalable
apt_extra_fonts.txt:t1-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree-syriac
apt_nice_to_have.txt:tree
apt_nice_to_have.txt:manpages-dev    # man pages for C standard library
apt_nice_to_have.txt:mlocate         # Provides the locate command
apt_optional_compile_R.txt:gobjc
apt_optional_compile_R.txt:libicu-dev
apt_required_build.txt:python3-minimal
apt_required_build.txt:git
apt_required_compile_R.txt:build-essential
apt_required_compile_R.txt:libx11-dev
apt_required_compile_R.txt:zlib1g-dev
apt_vignettes_reference_manuals.txt:texlive
apt_vignettes_reference_manuals.txt:texlive-font-utils          # for epstopdf
apt_vignettes_reference_manuals.txt:texlive-pstricks            # provides pstricks.sty
apt_vignettes_reference_manuals.txt:texlive-latex-extra         # provides fullpage.sty
apt_vignettes_reference_manuals.txt:texlive-fonts-extra         # provides incosolata.sty
apt_vignettes_reference_manuals.txt:texlive-bibtex-extra        # provides unsrturl.bst
apt_vignettes_reference_manuals.txt:texlive-science             # provides algorithm.sty
apt_vignettes_reference_manuals.txt:texlive-luatex              # provides luatex85.sty
apt_vignettes_reference_manuals.txt:texlive-lang-european       # provides language definition files e.g. swedish.ldf
apt_vignettes_reference_manuals.txt:texi2html
apt_vignettes_reference_manuals.txt:texinfo
apt_vignettes_reference_manuals.txt:pandoc                      # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:pandoc-citeproc             # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:biber
apt_vignettes_reference_manuals.txt:#ttf-mscorefonts-installer

So while it's fine that we don't include the vignette packages, we see there's still other packages in other files that we do want to include that have additional packages. To complicate matters more, we have dependencies installed from rocker that we don't need to install again because we don't replace the original package, they just add another layer (note: we're usually installing a -dev version):

gfortran
libbz2-*
libcurl4
libicu*
libpcre2*
libjpeg-turbo*
libreadline
libtiff*
liblzma*
zlib1g

It seems that if we want to install apt_bioc.txt, apt_cran.txt, apt_optional_compile_R, apt_required_compile_R, apt_extra_fonts (do we still need this for the docker?), apt_required_compile_R, and apt_required_build_R, we should expect that the docker is going to be larger because of the extra packages. I think the current method where I exclude packages will give better control; it's just that what's installed needs to become explicit. I also think we should keep the practice of annotating any package dependencies.

I was not able to find any reference for the following packages listed in the Dockerfile when searching code.bioconductor.org, with the exception of the first package. These are the candidates for the apt_docker_only.txt suggested by @hpages . But I think we should remove them if not needed. We could do a test comparing what can be installed with the current docker image and an image without the packages below.

libz-dev                            # ceTF, proBatch
byacc
libhdf5-serial-dev
libgslcblas0
libhdf5-dev
libxpm-dev
libmodule-build-perl
libprotoc-dev
librdf0-dev
libarchive-extract-perl
libfile-copy-recursive-perl
libcgi-pm-perl
libdbi-perl
libdbd-mysql-perl
libxml-simple-perl
sqlite                          # Not needed
openmpi-bin
mpi-default-bin
openmpi-common
openmpi-doc
tabix
imagemagick
ggobi
libzmq3-dev

jwokaty avatar Jul 13 '21 17:07 jwokaty

# Current devel as of July 23?
 [1] "affyPara"       "canceR"         "CelliD"         "cellity"       
 [5] "CompGO"         "ctgGEM"         "CytoTree"       "fgga"          
 [9] "GateFinder"     "gCrisprTools"   "gpuMagic"       "immunotation"  
[13] "lisaClust"      "methyAnalysis"  "phemd"          "rawrr"         
[17] "SCATE"          "scClassifR"     "schex"          "scTensor"      
[21] "scTGIF"         "SeqSQC"         "spatialHeatmap" "spicyR"        
[25] "SwimR"          "Travel"         "vissE"          "waddR"

# Building without some packages                     
 [1] "CAMERA"         "canceR"         "cellity"        "cicero"        
 [5] "cliqueMS"       "cosmiq"         "ctgGEM"         "CytoTree"      
 [9] "flagme"         "GateFinder"     "gpuMagic"       "igvR"          
[13] "immunotation"   "IPO"            "LOBSTAHS"       "MAIT"          
[17] "meshes"         "meshr"  %        "Metab"          "metaMS"        
[21] "methylscaper"   "monocle"        "ncGTW"          "phemd"         
[25] "proFIA"         "rawrr"          "RCyjs"          "Risa"          
[29] "scTensor"       "SeqSQC"         "spatialHeatmap" "tenXplore"     
[33] "Travel"         "uSORT"          "xcms"          

Not sure why these are different as this is not what I expected!

The following packages were selected because either we weren't sure what bioc packages required them or they appeared to be already satisfied by something in rocker. After building with these commented out, I manually reinstalled them one by one to see if they allowed additional bioc packages to be installed. Only libzmq3-dev allowed RCy3 to be installed.

Appeared to be already installed

byacc
fortran77-compiler
imagemagick
libgmp3-dev
libtiff5-dev
sqlite
ggobi
libarchive-extract-perl
libcgi-pm-perl
libdbd-mysql-perl
libfile-copy-recursive-perl
libgsl0-dev
libhdf5-serial-dev
liblapack-dev
libmariadb-dev-compat
libmodule-build-perl
librdf0-dev
libxml-simple-perl
libxpm-dev
mpi-default-bin
openmpi-doc
tabix

Not needed / No additional Bioc packages were install after the following apt packages were installed

default-libmysqlclient-dev
libgl1-mesa-dev
libdbi-perl
libprotoc-dev
libz-dev

jwokaty avatar Jul 28 '21 17:07 jwokaty

I've tried to address all previous comments. However, I'm not attempting to recreate what is in the master branch nor the BBS, but a container that installs packages from the BBS and can override entries that we don't want to install (for example, when they've already been installed via Rocker).

The current size is 4.25GB. All but the following Bioconductor packages are installed:

 [1] "ArrayExpressHTS" "brainflowprobes" "BridgeDbR"       "canceR" 
 [5] "CHRONOS"         "cn.mops"         "CNVfilteR"       "CNViz"     
 [9] "CopyNumberPlots" "CytoTree"        "DaMiRseq"        "debCAM"
 [13] "DeepPINCS"       "derfinder"       "derfinderPlot"   "esATAC"   
 [17] "gaggle"          "GARS"            "IsoGeneGUI"      "miRSM" 
 [21] "MSGFgui"         "MSGFplus"        "panelcn.mops"    "paxtoolsr"
 [25] "phemd"           "psichomics"      "Rcpi"            "recount"
 [29] "regionReport"    "ReQON"           "RGMQL"           "RMassBank"  
 [33] "rmelting"        "RmiR"            "RNAAgeCalc"      "sarks"
 [37] "SELEX"           "SICtools"        "VAExprs"

Here's what we actually install in the Docker. You see these in the output when container is built. You can comment out the clean up at the bottom of src/install.sh and view the install_apt_pkgs and install_pip_pkgs files to see these packages.

# APT
automake
clustalo
firefox
fuse
graphviz
infernal
jags
kallisto
libavfilter-dev
libcurl4-openssl-dev
libeigen3-dev
libfftw3-dev
libfribidi-dev
libfuse-dev
libgdal-dev
libgeos-dev
libgit2-dev
libglpk-dev
libglu1-mesa-dev
libgmp-dev
libgraphviz-dev
libgsl-dev
libgtk2.0-dev
libgtkmm-2.4-dev
libmagick++-dev
libmono-system-data4.0-cil
libmpfr-dev
libmysqlclient-dev
libnetcdf-dev
libopenbabel-dev
libpoppler-cpp-dev
libpq-dev
libproj-dev
libprotobuf-dev
librsvg2-dev
libsasl2-dev
libsbml5-dev
libssl-dev
libudunits2-dev
libv8-dev
libxml2-dev
mono-runtime
mpi-default-dev
ocl-icd-opencl-dev
protobuf-compiler
python3-pip
tcl-dev
tk-dev
curl
libzmq3-dev
python3-pip

# PIP
h5py
h5pyd
jupyter
matplotlib
mofapy
mofapy2
nbconvert
numpy
phate
scipy
tensorflow_probability
testresources
virtualenv

If this is still too big, I need to know where to cut. I could start removing dependencies required for only a few BioC packages.

If we want to be more explicit, I can write a script to generate the above packages and we can rerun the script every time we want to update the docker image. We can also commit the list of packages.

I still kept a list of packages to skip because the BBS files have quite a few packages that were already installed via Rocker.

jwokaty avatar Sep 08 '21 18:09 jwokaty

Thanks @jwokaty, I will review this today/tomorrow.

nturaga avatar Sep 08 '21 18:09 nturaga