trRosetta2 icon indicating copy to clipboard operation
trRosetta2 copied to clipboard

Conda environment cannot be built on Mac

Open weitzner opened this issue 4 years ago • 16 comments

While attempting to work through the provided setup instructions, I found I cannot create the conda environment on a Mac.

conda env create -f casp14-baker.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - astor==0.8.1=py36h06a4308_0
  - readline==8.1=h27cfd23_0
  - six==1.15.0=py36h06a4308_0
  - termcolor==1.1.0=py36h06a4308_1
  - tensorboard==1.14.0=py36hf484d3e_0
  - c-ares==1.17.1=h27cfd23_0
  - hdf5==1.10.6=hb1b8bf9_0
  - grpcio==1.36.1=py36h2157cd5_1
  - markdown==3.3.4=py36h06a4308_0
  - cupti==10.1.168=0
  - scipy==1.5.2=py36h0b6359f_0
  - protobuf==3.14.0=py36h2531618_1
  - numpy==1.19.2=py36h54aff64_0
  - mkl_random==1.1.1=py36h0573a6f_0
  - absl-py==0.12.0=py36h06a4308_0
  - libstdcxx-ng==9.1.0=hdf63c60_0
  - cudnn==7.6.5=cuda10.1_0
  - python==3.6.13=hdb3f193_0
  - mkl==2020.2=256
  - sqlite==3.35.4=hdfb4753_0
  - tk==8.6.10=hbc83047_0
  - certifi==2020.12.5=py36h06a4308_0
  - importlib-metadata==3.10.0=py36h06a4308_0
  - openssl==1.1.1k=h27cfd23_0
  - wrapt==1.12.1=py36h7b6447c_1
  - coverage==5.5=py36h27cfd23_2
  - libgcc-ng==9.1.0=hdf63c60_0
  - intel-openmp==2021.2.0=h06a4308_610
  - tensorflow-gpu==1.14.0=h0d30ee6_0
  - _tflow_select==2.1.0=gpu
  - h5py==2.10.0=py36hd6299e0_1
  - mkl_fft==1.3.0=py36h54f3939_0
  - libffi==3.3=he6710b0_2
  - ld_impl_linux-64==2.33.1=h53a641e_7
  - cudatoolkit==10.1.243=h6bb024c_0
  - libgfortran-ng==7.3.0=hdf63c60_0
  - tensorflow-base==1.14.0=gpu_py36he45bfe2_0
  - xz==5.2.5=h7b6447c_0
  - libprotobuf==3.14.0=h8c45485_0
  - pip==21.0.1=py36h06a4308_0
  - tensorflow==1.14.0=gpu_py36h3fb9ad6_0
  - cython==0.29.23=py36h2531618_0
  - ncurses==6.2=he6710b0_1
  - zlib==1.2.11=h7b6447c_3
  - setuptools==52.0.0=py36h06a4308_0
  - ca-certificates==2021.4.13=h06a4308_1
  - mkl-service==2.3.0=py36he8ac12f_0
  - numpy-base==1.19.2=py36hfa32c7d_0

Removing the build string specification (i.e. the rightmost equal sign to the end of the string) reduces the number of packages not found, but then includes several packages that are unavailable on osx including cuda-related packages. Removing the packages that are unavailable then results in unresolvable package conflicts and the environment fails to build.

Proposed solution: provide a minimal environment for network inference on a Mac/cpu.

weitzner avatar May 20 '21 21:05 weitzner

Brian, could you try casp14-baker-mac.yml?

gjoni avatar May 21 '21 05:05 gjoni

Thank you for the fast turnaround! The new env file builds correctly on my Mac, and I have begun to work through the remainder of the setup instructions. In install_dependencies.sh I modified the script to use conda to install hhsuite (conda install -c bioconda hhsuite -y) and psipred (conda install -c biocore psipred -y), everything works so far aside from blast-legacy, which is precompiled as 32-bit executable while current Macs will only execute 64-bit code.

Do you have any instructions for making a local build of blast-legacy? Thanks.

weitzner avatar May 21 '21 20:05 weitzner

You can get the legacy-blast source code from here: http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_src.tar.gz. And thank you for the suggestions about installing hhsuite and psipred from conda!

gjoni avatar May 21 '21 22:05 gjoni

I was thinking about blastpgp not csblast. Are both not required?

weitzner avatar May 21 '21 23:05 weitzner

Sorry, messed it up: both blast and csblast are required. All pre-compiled blast versions are located here https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/. I tested blast-2.2.26-universal-macosx.tar.gz on Darwin 18.7.0 where it worked fine. I'll have a look whether it can be replaced with something else.

gjoni avatar May 22 '21 00:05 gjoni

Unfortunately the pre-built executables will not execute on the current versions of macOS. I did a little work and got it to build doing the following:

wget https://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20120620/ncbi.tar.gz
tar xf ncbi.tar.gz
patch -p0 < fix-macos-build.patch
./ncbi/make/makedis.csh 2>&1 | tee out.makedis.txt

where fix-macos-build.patch contains

--- ncbi/api/asn2gnb2.c.orig	2021-05-21 18:31:24.000000000 -0700
+++ ncbi/api/asn2gnb2.c	2021-05-21 18:32:09.000000000 -0700
@@ -56,6 +56,7 @@
 #include <edutil.h>
 #include <alignmgr2.h>
 #include <asn2gnbi.h>
+#include <ffprint.h>

 #ifdef WIN_MAC
 #if __profile__
--- ncbi/api/salutil.c.orig	2021-05-21 17:33:10.000000000 -0700
+++ ncbi/api/salutil.c	2021-05-21 17:33:34.000000000 -0700
@@ -51,6 +51,7 @@
 #include <edutil.h>
 #include <sequtil.h>
 #include <sqnutils.h>
+#include <alignmgr2.h>

 #ifdef SALSA_DEBUG
 #include <simutil.h>
--- ncbi/network/wwwblast/Src/viewgif.c.orig	2021-05-21 17:34:12.000000000 -0700
+++ ncbi/network/wwwblast/Src/viewgif.c	2021-05-21 17:34:39.000000000 -0700
@@ -45,6 +45,7 @@
 #include <string.h>
 #include <signal.h>
 #include <fcntl.h>
+#include <unistd.h>


 static void SigAlrmHandler(int);
--- ncbi/network/wwwblast/Src/wblast2.c.orig	2021-05-21 17:35:19.000000000 -0700
+++ ncbi/network/wwwblast/Src/wblast2.c	2021-05-21 17:36:02.000000000 -0700
@@ -296,6 +296,9 @@
 #include <algo/blast/api/twoseq_api.h>
 #endif

+#include <algo/blast/core/blast_util.h>
+#include <accid1.h>
+
 #define MY_BLOSUM62 0
 #define MY_PAM30 1
 #define MY_PAM70 2
--- ncbi/corelib/ncbimisc.c.orig	2021-05-21 18:20:30.000000000 -0700
+++ ncbi/corelib/ncbimisc.c	2021-05-21 18:20:54.000000000 -0700
@@ -1266,7 +1266,7 @@
   if (len < 1) return NULL;
 
   rsult = (Nlm_CharPtr) MemNew (len + 3);
-  if (rsult == NULL) return;
+  if (rsult == NULL) return NULL;
   tmp = rsult;
 
   for (i = 0; /* local [i] != NULL */ i < numitems; i++) {
--- ncbi/make/makedis.csh	23 Mar 2009 17:10:14 -0000
+++ ncbi/make/makedis.csh	9 Nov 2009 18:44:01 -0000
@@ -240,7 +240,7 @@
 		endif
 	endif
 	set HAVE_MOTIF=0
-	set HAVE_MAC=1
+	# set HAVE_MAC=1
 	breaksw
 case NetBSD:
 	set platform=netbsd

weitzner avatar May 22 '21 01:05 weitzner

This may work too (at least works on Linux and an older Mac which I'm using for testing): conda install -c bioconda blast-legacy

gjoni avatar May 22 '21 03:05 gjoni

Sadly, that has the same issue (bad cpu type).

weitzner avatar May 25 '21 17:05 weitzner

Hey Ivan, I just submitted https://github.com/bioconda/bioconda-recipes/pull/28672 to fix the blast-legacy build – conda install -c bioconda blast-legacy should work on current Macs now!

weitzner avatar May 26 '21 19:05 weitzner

Thanks a lot Brian! With you help all the installation process is now much more transparent.

gjoni avatar May 27 '21 04:05 gjoni

Oh no, after attempting to build this from scratch, I am still getting conda conflicts. I am going to try removing the build specifications and possibly some versioning info (starting with z if the version is x.y.z). Assuming that a lot of these requirements are dependencies of dependencies, is there a minimal set of requirements that I could start from? I'm pretty excited to get this up and running!

weitzner avatar May 28 '21 19:05 weitzner

I would start with the ones in the list below (python v 3.6):

tensorflow==1.14
pytorch==1.4
scikit-learn
pandas
tape_proteins

gjoni avatar May 28 '21 22:05 gjoni

Thanks! This simplified version builds and I will run some tests soon!

name: casp14-baker
channels:
  - defaults
  - bioconda
  - biocore
  - conda-forge
  - https://<user>:<password>@conda.graylab.jhu.edu
dependencies:
  - biopython=1.78
  - blast-legacy=2.2.26
  - hhsuite
  - numpy=1.19
  - pandas
  - psipred=4.01
  - pyrosetta
  - python=3.6
  - pytorch=1.4
  - scikit-learn=0.24
  - scipy=1.5
  - tensorflow=1.14
  - pip:
    - tape-proteins==0.4

and the install_dependencies.sh I am using is

#!/bin/bash

case "$(uname -s)" in
    Linux*)     platform=linux;;
    Darwin*)    platform=macosx;;
    *)          echo "unsupported OS type. exiting"; exit 1
esac
echo "installing for ${platform}"

# download lddt
echo "downloading lddt . . ."
wget https://openstructure.org/static/lddt-${platform}.zip -O lddt.zip
unzip -d lddt -j lddt.zip

# the cs-blast platform descriptoin includes the width of memory addresses
# we expect a 64-bit operating system
if [[ ${platform} == "linux" ]]; then
    platform=${platform}64
fi

# download cs-blast
echo "downloading cs-blast . . ."
wget http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_${platform}.tar.gz -O csblast-2.2.3.tar.gz
mkdir -p csblast-2.2.3
tar xf csblast-2.2.3.tar.gz -C csblast-2.2.3 --strip-components=1

# download and install gnu-parallel if it's not already installed
if ! command -v parallel &> /dev/null; then
    echo "downloading gnu-parallel . . ."
    wget https://ftpmirror.gnu.org/parallel/parallel-latest.tar.bz2
    mkdir -p parallel
    tar xf parallel-latest.tar.bz2 -C parallel --strip-components=1
    cd parallel
    (
    ./configure --prefix=`pwd` && make && make install
    ) > install.stdout 2> install.stderr
    cd ..
fi

if the tests all pass, I can open a PR with these changes if you'd like.

weitzner avatar May 28 '21 23:05 weitzner

cool! it seems that one can also install parallel through conda install either from bioconda or from conda-forge channels

gjoni avatar May 28 '21 23:05 gjoni

Oh neat! I'll update for that as well

weitzner avatar May 28 '21 23:05 weitzner

It looks like both lddt and csblast are GPLv3 - I'll see if we can add them to bioconda and greatly simplify deployment.

weitzner avatar May 29 '21 00:05 weitzner