Conda environment cannot be built on Mac
While attempting to work through the provided setup instructions, I found I cannot create the conda environment on a Mac.
conda env create -f casp14-baker.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- astor==0.8.1=py36h06a4308_0
- readline==8.1=h27cfd23_0
- six==1.15.0=py36h06a4308_0
- termcolor==1.1.0=py36h06a4308_1
- tensorboard==1.14.0=py36hf484d3e_0
- c-ares==1.17.1=h27cfd23_0
- hdf5==1.10.6=hb1b8bf9_0
- grpcio==1.36.1=py36h2157cd5_1
- markdown==3.3.4=py36h06a4308_0
- cupti==10.1.168=0
- scipy==1.5.2=py36h0b6359f_0
- protobuf==3.14.0=py36h2531618_1
- numpy==1.19.2=py36h54aff64_0
- mkl_random==1.1.1=py36h0573a6f_0
- absl-py==0.12.0=py36h06a4308_0
- libstdcxx-ng==9.1.0=hdf63c60_0
- cudnn==7.6.5=cuda10.1_0
- python==3.6.13=hdb3f193_0
- mkl==2020.2=256
- sqlite==3.35.4=hdfb4753_0
- tk==8.6.10=hbc83047_0
- certifi==2020.12.5=py36h06a4308_0
- importlib-metadata==3.10.0=py36h06a4308_0
- openssl==1.1.1k=h27cfd23_0
- wrapt==1.12.1=py36h7b6447c_1
- coverage==5.5=py36h27cfd23_2
- libgcc-ng==9.1.0=hdf63c60_0
- intel-openmp==2021.2.0=h06a4308_610
- tensorflow-gpu==1.14.0=h0d30ee6_0
- _tflow_select==2.1.0=gpu
- h5py==2.10.0=py36hd6299e0_1
- mkl_fft==1.3.0=py36h54f3939_0
- libffi==3.3=he6710b0_2
- ld_impl_linux-64==2.33.1=h53a641e_7
- cudatoolkit==10.1.243=h6bb024c_0
- libgfortran-ng==7.3.0=hdf63c60_0
- tensorflow-base==1.14.0=gpu_py36he45bfe2_0
- xz==5.2.5=h7b6447c_0
- libprotobuf==3.14.0=h8c45485_0
- pip==21.0.1=py36h06a4308_0
- tensorflow==1.14.0=gpu_py36h3fb9ad6_0
- cython==0.29.23=py36h2531618_0
- ncurses==6.2=he6710b0_1
- zlib==1.2.11=h7b6447c_3
- setuptools==52.0.0=py36h06a4308_0
- ca-certificates==2021.4.13=h06a4308_1
- mkl-service==2.3.0=py36he8ac12f_0
- numpy-base==1.19.2=py36hfa32c7d_0
Removing the build string specification (i.e. the rightmost equal sign to the end of the string) reduces the number of packages not found, but then includes several packages that are unavailable on osx including cuda-related packages. Removing the packages that are unavailable then results in unresolvable package conflicts and the environment fails to build.
Proposed solution: provide a minimal environment for network inference on a Mac/cpu.
Brian, could you try casp14-baker-mac.yml?
Thank you for the fast turnaround! The new env file builds correctly on my Mac, and I have begun to work through the remainder of the setup instructions. In install_dependencies.sh I modified the script to use conda to install hhsuite (conda install -c bioconda hhsuite -y) and psipred (conda install -c biocore psipred -y), everything works so far aside from blast-legacy, which is precompiled as 32-bit executable while current Macs will only execute 64-bit code.
Do you have any instructions for making a local build of blast-legacy? Thanks.
You can get the legacy-blast source code from here:
http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_src.tar.gz. And thank you for the suggestions about installing hhsuite and psipred from conda!
I was thinking about blastpgp not csblast. Are both not required?
Sorry, messed it up: both blast and csblast are required. All pre-compiled blast versions are located here https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/. I tested blast-2.2.26-universal-macosx.tar.gz on Darwin 18.7.0 where it worked fine. I'll have a look whether it can be replaced with something else.
Unfortunately the pre-built executables will not execute on the current versions of macOS. I did a little work and got it to build doing the following:
wget https://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/old/20120620/ncbi.tar.gz
tar xf ncbi.tar.gz
patch -p0 < fix-macos-build.patch
./ncbi/make/makedis.csh 2>&1 | tee out.makedis.txt
where fix-macos-build.patch contains
--- ncbi/api/asn2gnb2.c.orig 2021-05-21 18:31:24.000000000 -0700
+++ ncbi/api/asn2gnb2.c 2021-05-21 18:32:09.000000000 -0700
@@ -56,6 +56,7 @@
#include <edutil.h>
#include <alignmgr2.h>
#include <asn2gnbi.h>
+#include <ffprint.h>
#ifdef WIN_MAC
#if __profile__
--- ncbi/api/salutil.c.orig 2021-05-21 17:33:10.000000000 -0700
+++ ncbi/api/salutil.c 2021-05-21 17:33:34.000000000 -0700
@@ -51,6 +51,7 @@
#include <edutil.h>
#include <sequtil.h>
#include <sqnutils.h>
+#include <alignmgr2.h>
#ifdef SALSA_DEBUG
#include <simutil.h>
--- ncbi/network/wwwblast/Src/viewgif.c.orig 2021-05-21 17:34:12.000000000 -0700
+++ ncbi/network/wwwblast/Src/viewgif.c 2021-05-21 17:34:39.000000000 -0700
@@ -45,6 +45,7 @@
#include <string.h>
#include <signal.h>
#include <fcntl.h>
+#include <unistd.h>
static void SigAlrmHandler(int);
--- ncbi/network/wwwblast/Src/wblast2.c.orig 2021-05-21 17:35:19.000000000 -0700
+++ ncbi/network/wwwblast/Src/wblast2.c 2021-05-21 17:36:02.000000000 -0700
@@ -296,6 +296,9 @@
#include <algo/blast/api/twoseq_api.h>
#endif
+#include <algo/blast/core/blast_util.h>
+#include <accid1.h>
+
#define MY_BLOSUM62 0
#define MY_PAM30 1
#define MY_PAM70 2
--- ncbi/corelib/ncbimisc.c.orig 2021-05-21 18:20:30.000000000 -0700
+++ ncbi/corelib/ncbimisc.c 2021-05-21 18:20:54.000000000 -0700
@@ -1266,7 +1266,7 @@
if (len < 1) return NULL;
rsult = (Nlm_CharPtr) MemNew (len + 3);
- if (rsult == NULL) return;
+ if (rsult == NULL) return NULL;
tmp = rsult;
for (i = 0; /* local [i] != NULL */ i < numitems; i++) {
--- ncbi/make/makedis.csh 23 Mar 2009 17:10:14 -0000
+++ ncbi/make/makedis.csh 9 Nov 2009 18:44:01 -0000
@@ -240,7 +240,7 @@
endif
endif
set HAVE_MOTIF=0
- set HAVE_MAC=1
+ # set HAVE_MAC=1
breaksw
case NetBSD:
set platform=netbsd
This may work too (at least works on Linux and an older Mac which I'm using for testing):
conda install -c bioconda blast-legacy
Sadly, that has the same issue (bad cpu type).
Hey Ivan, I just submitted https://github.com/bioconda/bioconda-recipes/pull/28672 to fix the blast-legacy build – conda install -c bioconda blast-legacy should work on current Macs now!
Thanks a lot Brian! With you help all the installation process is now much more transparent.
Oh no, after attempting to build this from scratch, I am still getting conda conflicts. I am going to try removing the build specifications and possibly some versioning info (starting with z if the version is x.y.z). Assuming that a lot of these requirements are dependencies of dependencies, is there a minimal set of requirements that I could start from? I'm pretty excited to get this up and running!
I would start with the ones in the list below (python v 3.6):
tensorflow==1.14
pytorch==1.4
scikit-learn
pandas
tape_proteins
Thanks! This simplified version builds and I will run some tests soon!
name: casp14-baker
channels:
- defaults
- bioconda
- biocore
- conda-forge
- https://<user>:<password>@conda.graylab.jhu.edu
dependencies:
- biopython=1.78
- blast-legacy=2.2.26
- hhsuite
- numpy=1.19
- pandas
- psipred=4.01
- pyrosetta
- python=3.6
- pytorch=1.4
- scikit-learn=0.24
- scipy=1.5
- tensorflow=1.14
- pip:
- tape-proteins==0.4
and the install_dependencies.sh I am using is
#!/bin/bash
case "$(uname -s)" in
Linux*) platform=linux;;
Darwin*) platform=macosx;;
*) echo "unsupported OS type. exiting"; exit 1
esac
echo "installing for ${platform}"
# download lddt
echo "downloading lddt . . ."
wget https://openstructure.org/static/lddt-${platform}.zip -O lddt.zip
unzip -d lddt -j lddt.zip
# the cs-blast platform descriptoin includes the width of memory addresses
# we expect a 64-bit operating system
if [[ ${platform} == "linux" ]]; then
platform=${platform}64
fi
# download cs-blast
echo "downloading cs-blast . . ."
wget http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_${platform}.tar.gz -O csblast-2.2.3.tar.gz
mkdir -p csblast-2.2.3
tar xf csblast-2.2.3.tar.gz -C csblast-2.2.3 --strip-components=1
# download and install gnu-parallel if it's not already installed
if ! command -v parallel &> /dev/null; then
echo "downloading gnu-parallel . . ."
wget https://ftpmirror.gnu.org/parallel/parallel-latest.tar.bz2
mkdir -p parallel
tar xf parallel-latest.tar.bz2 -C parallel --strip-components=1
cd parallel
(
./configure --prefix=`pwd` && make && make install
) > install.stdout 2> install.stderr
cd ..
fi
if the tests all pass, I can open a PR with these changes if you'd like.
cool! it seems that one can also install parallel through conda install either from bioconda or from conda-forge channels
Oh neat! I'll update for that as well
It looks like both lddt and csblast are GPLv3 - I'll see if we can add them to bioconda and greatly simplify deployment.