Error converting symbolic tensor object to numpy while running RELERNN_TRAIN
While running RELERNN_TRAIN, I ran into the following error which appears to be the result of a failure to convert a tensor object to a numpy array. I ran into this after a fresh install of a conda environment following the same versions of dependencies specified in the documentation (tensorflow/2.2.0, cudatoolkit/10.1.243, and cudnn/7.6.5). Any help would be greatly appreciated!
2024-12-19 09:15:57.795018: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2024-12-19 09:15:57.871059: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2944210000 Hz
2024-12-19 09:15:57.871375: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555559e7dcf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-12-19 09:15:57.871560: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2024-12-19 09:15:57.884792: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2024-12-19 09:16:28.739187: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
File "/home/brscott4/.conda/envs/relernn/bin/ReLERNN_TRAIN", line 130, in <module>
main()
File "/home/brscott4/.conda/envs/relernn/bin/ReLERNN_TRAIN", line 109, in main
runModels(ModelFuncPointer=GRU_TUNED84,
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/ReLERNN/helpers.py", line 344, in runModels
model = ModelFuncPointer(x,y)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/ReLERNN/networks.py", line 19, in GRU_TUNED84
model = layers.Bidirectional(layers.GRU(84,return_sequences=False))(genotype_inputs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 531, in __call__
return super(Bidirectional, self).__call__(inputs, **kwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 644, in call
y = self.forward_layer(forward_inputs,
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 654, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py", line 408, in call
inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 848, in _process_inputs
initial_state = self.get_initial_state(inputs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 636, in get_initial_state
init_state = get_initial_state_fn(
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 1910, in get_initial_state
return _generate_zero_filled_state_for_cell(self, inputs, batch_size, dtype)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2926, in _generate_zero_filled_state_for_cell
return _generate_zero_filled_state(batch_size, cell.state_size, dtype)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2944, in _generate_zero_filled_state
return create_zeros(state_size)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2939, in create_zeros
return array_ops.zeros(init_state_size, dtype=dtype)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2677, in wrapped
tensor = fun(*args, **kwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2721, in zeros
output = _constant_if_small(zero, shape, dtype, name)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2662, in _constant_if_small
if np.prod(shape) < 1000:
File "<__array_function__ internals>", line 180, in prod
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 3045, in prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 748, in __array__
raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
NotImplementedError: Cannot convert a symbolic Tensor (bidirectional/forward_gru/strided_slice:0) to a numpy array.
hi there-- to help debug this can you give me a full list of the versions in your python environment? assuming you use conda you can get this with conda list
also are you getting this error trying to run our example input?
Hello, thanks for getting back to me. Yes I am getting the same error when I try to run the sample input.
Here are my conda environment details:
(relernn) [brscott4@sc003:/scratch/brscott4/downloads]$ conda list
# packages in environment at /home/brscott4/.conda/envs/relernn:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_tflow_select 2.3.0 mkl
absl-py 2.1.0 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.13 hb9d3cd8_0 conda-forge
apricot-select 0.6.1 pyhd8ed1ab_0 conda-forge
asciitree 0.3.3 py_2 conda-forge
astor 0.8.1 pyh9f0ad1d_0 conda-forge
astunparse 1.6.3 pyhd8ed1ab_2 conda-forge
attr 2.5.1 h166bdaf_1 conda-forge
attrs 24.2.0 pyh71513ae_0 conda-forge
aws-c-auth 0.7.22 h96bc93b_2 conda-forge
aws-c-cal 0.6.14 h88a6e22_1 conda-forge
aws-c-common 0.9.19 h4ab18f5_0 conda-forge
aws-c-compression 0.2.18 h83b837d_6 conda-forge
aws-c-event-stream 0.4.2 ha47c788_12 conda-forge
aws-c-http 0.8.1 h29d6fba_17 conda-forge
aws-c-io 0.14.8 h21d4f22_5 conda-forge
aws-c-mqtt 0.10.4 h759edc4_4 conda-forge
aws-c-s3 0.5.9 h594631b_3 conda-forge
aws-c-sdkutils 0.1.16 h83b837d_2 conda-forge
aws-checksums 0.1.18 h83b837d_6 conda-forge
aws-crt-cpp 0.26.9 he3a8b3b_0 conda-forge
aws-sdk-cpp 1.11.329 hba8bd5f_3 conda-forge
blas 1.1 openblas conda-forge
bokeh 3.1.1 pyhd8ed1ab_0 conda-forge
brotli 1.1.0 hd590300_1 conda-forge
brotli-bin 1.1.0 hd590300_1 conda-forge
brotli-python 1.1.0 py38h17151c0_1 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.34.4 hb9d3cd8_0 conda-forge
ca-certificates 2024.12.14 hbcca054_0 conda-forge
cairo 1.18.0 h3faef2a_0 conda-forge
certifi 2024.8.30 pyhd8ed1ab_0 conda-forge
cffi 1.17.0 py38heb5c249_0 conda-forge
charset-normalizer 3.4.0 pyhd8ed1ab_0 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
cloudpickle 3.1.0 pyhd8ed1ab_1 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
contourpy 1.1.1 py38h7f3f72f_1 conda-forge
cudatoolkit 10.1.243 h6d9799a_13 conda-forge
cudnn 7.6.5.32 hc0a50b0_1 conda-forge
cycler 0.12.1 pyhd8ed1ab_0 conda-forge
cytoolz 0.12.3 py38h01eb140_0 conda-forge
dask 2023.5.0 pyhd8ed1ab_0 conda-forge
dask-core 2023.5.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
demes 0.2.3 pyhd8ed1ab_0 conda-forge
distributed 2023.5.0 pyhd8ed1ab_0 conda-forge
expat 2.6.4 h5888daf_0 conda-forge
fasteners 0.17.3 pyhd8ed1ab_0 conda-forge
filelock 3.16.1 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_3 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.53.1 py38h2019614_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
fsspec 2024.10.0 pyhff2d567_0 conda-forge
gast 0.3.3 py_0 conda-forge
gettext 0.22.5 he02047a_3 conda-forge
gettext-tools 0.22.5 he02047a_3 conda-forge
gflags 2.2.2 h5888daf_1005 conda-forge
glib 2.80.2 hf974151_0 conda-forge
glib-tools 2.80.2 hb6ce0ca_0 conda-forge
glog 0.7.1 hbabe93e_0 conda-forge
gmp 6.3.0 hac33072_2 conda-forge
gmpy2 2.1.5 py38h6a1700d_1 conda-forge
google-pasta 0.2.0 pyhd8ed1ab_1 conda-forge
graphite2 1.3.13 h59595ed_1003 conda-forge
grpcio 1.62.2 py38h94a1851_0 conda-forge
gsl 2.7 he838d99_0 conda-forge
gst-plugins-base 1.24.4 h9ad1361_0 conda-forge
gstreamer 1.24.4 haf2f30d_0 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
h5py 2.10.0 nompi_py38h9915d05_106 conda-forge
harfbuzz 8.5.0 hfac3d4d_0 conda-forge
hdf5 1.10.6 h3ffc7dd_1
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.10 pyhd8ed1ab_0 conda-forge
importlib-metadata 8.5.0 pyha770c72_0 conda-forge
importlib-resources 6.4.5 pyhd8ed1ab_0 conda-forge
importlib_metadata 8.5.0 hd8ed1ab_1 conda-forge
importlib_resources 6.4.5 pyhd8ed1ab_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
joblib 1.4.2 pyhd8ed1ab_0 conda-forge
jsonschema 4.23.0 pyhd8ed1ab_0 conda-forge
jsonschema-specifications 2024.10.1 pyhd8ed1ab_0 conda-forge
keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.5 py38h7f3f72f_1 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.16 hb7c19ff_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_2 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20240116.2 cxx17_he02047a_1 conda-forge
libarrow 16.1.0 hcb6531f_6_cpu conda-forge
libarrow-acero 16.1.0 hac33072_6_cpu conda-forge
libarrow-dataset 16.1.0 hac33072_6_cpu conda-forge
libarrow-substrait 16.1.0 h7e0c224_6_cpu conda-forge
libasprintf 0.22.5 he8f35ee_3 conda-forge
libasprintf-devel 0.22.5 he8f35ee_3 conda-forge
libblas 3.9.0 26_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcap 2.71 h39aace5_0 conda-forge
libcblas 3.9.0 26_linux64_openblas conda-forge
libclang-cpp15 15.0.7 default_h127d8a8_5 conda-forge
libclang13 18.1.7 default_h087397f_0 conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcups 2.3.3 h4637d8d_4 conda-forge
libcurl 8.8.0 hca28451_1 conda-forge
libdeflate 1.20 hd590300_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.6.4 h5888daf_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libflac 1.4.3 h59595ed_0 conda-forge
libgcc 14.2.0 h77fa898_1 conda-forge
libgcc-ng 14.2.0 h69a702a_1 conda-forge
libgcrypt-lib 1.11.0 hb9d3cd8_2 conda-forge
libgettextpo 0.22.5 he02047a_3 conda-forge
libgettextpo-devel 0.22.5 he02047a_3 conda-forge
libgfortran 14.2.0 h69a702a_1 conda-forge
libgfortran-ng 14.2.0 h69a702a_1 conda-forge
libgfortran5 14.2.0 hd5240d6_1 conda-forge
libglib 2.80.2 hf974151_0 conda-forge
libgomp 14.2.0 h77fa898_1 conda-forge
libgoogle-cloud 2.24.0 h2736e30_0 conda-forge
libgoogle-cloud-storage 2.24.0 h3d9a0c8_0 conda-forge
libgpg-error 1.51 hbd13f7d_1 conda-forge
libgrpc 1.62.2 h15f2491_0 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
liblapack 3.9.0 26_linux64_openblas conda-forge
libllvm10 10.0.1 he513fc3_3 conda-forge
libllvm15 15.0.7 hb3ce162_4 conda-forge
libllvm18 18.1.7 hb77312f_0 conda-forge
liblzma 5.6.3 hb9d3cd8_1 conda-forge
liblzma-devel 5.6.3 hb9d3cd8_1 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libogg 1.3.5 h4ab18f5_0 conda-forge
libopenblas 0.3.28 pthreads_h94d23a6_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libparquet 16.1.0 h6a7eafb_6_cpu conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libpq 16.6 h035377e_1 conda-forge
libprotobuf 4.25.3 h08a7969_0 conda-forge
libre2-11 2023.09.01 h5a48ba9_2 conda-forge
libsndfile 1.2.2 hc60ed4a_1 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx 14.2.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.2.0 h4852527_1 conda-forge
libsystemd0 256.9 h2774228_0 conda-forge
libthrift 0.19.0 hb90f79a_1 conda-forge
libtiff 4.6.0 h1dd3fc0_3 conda-forge
libtorch 2.4.0 cpu_generic_h4a3044c_1 conda-forge
libutf8proc 2.8.0 hf23e847_1 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.49.2 hb9d3cd8_0 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxkbcommon 1.7.0 h662e7e4_0 conda-forge
libxml2 2.12.7 hc051c1a_1 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
llvmlite 0.36.0 py38h4630a5e_0 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
lz4 4.3.3 py38hdcd8cb4_0 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
markdown 3.6 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.5 py38h01eb140_0 conda-forge
matplotlib 3.7.3 py38h578d9bd_0 conda-forge
matplotlib-base 3.7.3 py38h58ed7fa_0 conda-forge
mpc 1.3.1 h24ddda3_1 conda-forge
mpfr 4.2.1 h90cbb55_3 conda-forge
mpg123 1.32.9 hc50e24c_0 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
msgpack-python 1.0.8 py38hea7755e_0 conda-forge
msprime 1.3.1 py38h50512c5_1 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.3.0 hf1915f5_4 conda-forge
mysql-libs 8.3.0 hca2cd23_4 conda-forge
ncurses 6.5 he02047a_1 conda-forge
networkx 3.1 pyhd8ed1ab_0 conda-forge
newick 1.9.0 pypi_0 pypi
nomkl 1.0 h5ca1d4c_0 conda-forge
nose 1.3.7 py_1006 conda-forge
nspr 4.36 h5888daf_0 conda-forge
nss 3.100 hca3bf56_0 conda-forge
numba 0.53.1 py38ha9443f7_0
numcodecs 0.12.1 py38h854fd01_1 conda-forge
numexpr 2.8.4 py38hb2af0cf_101 conda-forge
numpy 1.23.5 py38h7042d01_0 conda-forge
openblas 0.3.28 pthreads_h6ec200e_1 conda-forge
openjpeg 2.5.2 h488ebb8_0 conda-forge
openssl 3.4.0 hb9d3cd8_0 conda-forge
opt_einsum 3.4.0 pyhd8ed1ab_0 conda-forge
orc 2.0.1 h17fec99_1 conda-forge
packaging 24.2 pyhd8ed1ab_2 conda-forge
pandas 2.0.3 py38h01efb38_1 conda-forge
partd 1.4.1 pyhd8ed1ab_0 conda-forge
patsy 0.5.6 pyhd8ed1ab_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
pillow 10.3.0 py38h9e66945_0 conda-forge
pip 24.3.1 pyh8b19718_0 conda-forge
pixman 0.44.2 h29eaf8c_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge
platformdirs 4.3.6 pyhd8ed1ab_0 conda-forge
ply 3.11 pyhd8ed1ab_2 conda-forge
pomegranate 1.0.0 pyhd8ed1ab_1 conda-forge
pooch 1.8.2 pyhd8ed1ab_0 conda-forge
protobuf 4.25.3 py38hb5c7596_0 conda-forge
psutil 6.0.0 py38hfb59056_0 conda-forge
pthread-stubs 0.4 hb9d3cd8_1002 conda-forge
pulseaudio-client 17.0 hb77b528_0 conda-forge
pyarrow 16.1.0 py38hb563948_2 conda-forge
pyarrow-core 16.1.0 py38he753e70_2_cpu conda-forge
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pyparsing 3.1.4 pyhd8ed1ab_0 conda-forge
pyqt 5.15.9 py38hffdaa6c_5 conda-forge
pyqt5-sip 12.12.2 py38h17151c0_5 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.8.19 hd12c33a_0_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.2 pyhd8ed1ab_0 conda-forge
python_abi 3.8 5_cp38 conda-forge
pytorch 2.4.0 cpu_generic_py38hbd07d99_1 conda-forge
pytz 2024.2 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.2 py38h2019614_0 conda-forge
qt-main 5.15.8 hc9dc06e_21 conda-forge
re2 2023.09.01 h7f4b329_2 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.35.1 pyhd8ed1ab_0 conda-forge
relernn 0.2 pypi_0 pypi
requests 2.32.3 pyhd8ed1ab_0 conda-forge
rpds-py 0.20.0 py38h4005ec7_0 conda-forge
ruamel.yaml 0.18.6 py38h01eb140_0 conda-forge
ruamel.yaml.clib 0.2.8 py38h01eb140_0 conda-forge
s2n 1.4.15 he19d79f_0 conda-forge
scikit-allel 1.3.7 py38h53bb729_1 conda-forge
scikit-learn 1.3.2 py38ha25d942_2 conda-forge
scipy 1.10.1 py38h32ae08f_1
seaborn 0.13.2 hd8ed1ab_2 conda-forge
seaborn-base 0.13.2 pyhd8ed1ab_2 conda-forge
setuptools 75.3.0 pyhd8ed1ab_0 conda-forge
sip 6.7.12 py38h17151c0_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleef 3.7 h1b44611_2 conda-forge
snappy 1.2.1 h8bd8927_1 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
statsmodels 0.14.1 py38h7f0c24c_0 conda-forge
svgwrite 1.4.3 pyhd8ed1ab_0 conda-forge
sympy 1.13.3 pypyh2585a3b_103 conda-forge
tbb 2020.3 hfd86e86_0
tblib 3.0.0 pyhd8ed1ab_0 conda-forge
tensorboard 2.17.1 pyhd8ed1ab_0 conda-forge
tensorboard-data-server 0.7.0 py38hcdda232_1 conda-forge
tensorflow 2.2.0 mkl_py38h6d3daf0_0
tensorflow-base 2.2.0 mkl_py38h5059a2d_0
tensorflow-estimator 2.6.0 py38h709712a_0 conda-forge
termcolor 2.4.0 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
tomli 2.0.2 pyhd8ed1ab_0 conda-forge
toolz 1.0.0 pyhd8ed1ab_0 conda-forge
tornado 6.4.1 py38hfb59056_0 conda-forge
tqdm 4.67.1 pyhd8ed1ab_0 conda-forge
tskit 0.5.6 py38he82f83a_2 conda-forge
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024b hc8b5060_0 conda-forge
unicodedata2 15.1.0 py38h01eb140_0 conda-forge
urllib3 2.2.3 pyhd8ed1ab_0 conda-forge
werkzeug 3.0.6 pyhd8ed1ab_0 conda-forge
wheel 0.45.1 pyhd8ed1ab_0 conda-forge
wrapt 1.16.0 py38h01eb140_0 conda-forge
xcb-util 0.4.0 hd590300_1 conda-forge
xcb-util-image 0.4.0 h8ee46fc_1 conda-forge
xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge
xcb-util-renderutil 0.3.9 hd590300_1 conda-forge
xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge
xkeyboard-config 2.42 h4ab18f5_0 conda-forge
xorg-kbproto 1.0.7 hb9d3cd8_1003 conda-forge
xorg-libice 1.1.2 hb9d3cd8_0 conda-forge
xorg-libsm 1.2.5 he73a12e_0 conda-forge
xorg-libx11 1.8.9 h8ee46fc_0 conda-forge
xorg-libxau 1.0.12 hb9d3cd8_0 conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-renderproto 0.11.1 hb9d3cd8_1003 conda-forge
xorg-xextproto 7.3.0 hb9d3cd8_1004 conda-forge
xorg-xf86vidmodeproto 2.3.1 hb9d3cd8_1005 conda-forge
xorg-xproto 7.0.31 hb9d3cd8_1008 conda-forge
xyzservices 2024.9.0 pyhd8ed1ab_1 conda-forge
xz 5.6.3 hbcc6ac9_1 conda-forge
xz-gpl-tools 5.6.3 hbcc6ac9_1 conda-forge
xz-tools 5.6.3 hb9d3cd8_1 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zarr 2.17.1 pyhd8ed1ab_0 conda-forge
zict 3.0.0 pyhd8ed1ab_0 conda-forge
zipp 3.21.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 h4ab18f5_6 conda-forge
zstandard 0.23.0 py38h62bed22_0 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
looks like this might be caused by the older version of tensorflow you are using. try these steps from within the ReLERNN directory
# 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test
# 2. confirm pip is pointing to this env
which pip
# 3. use that pip to install everything for this repo
pip install .
# 4. test this installation
cd examples
./example_pipeline.sh
Hi Andrew,
Thanks for the help!
I have tried your suggested lines and here are the output:
ModuleNotFoundError: No module named 'h5py'
So I installed h5py to this testing env, and reran the test script:
ModuleNotFoundError: No module named 'tensorflow'
Then again I installed tensorflow via conda/mamba to this test env. And tested it again (numpy was downgraded from 2.2.2 to 1.26.4). This time ReLERNN was able to begin running:
...
Total params: 771,889 (2.94 MB)
Trainable params: 771,889 (2.94 MB)
Non-trainable params: 0 (0.00 B)
Traceback (most recent call last):
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 130, in <module>
main()
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 109, in main
runModels(ModelFuncPointer=GRU_TUNED84,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 370, in runModels
history = model.fit(TrainGenerator,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
return fn(*args, **kwargs)
TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'use_multiprocessing'
2025-01-23 12:39:33.934900: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Importing HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
Traceback (most recent call last):
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 155, in <module>
main()
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 122, in main
load_and_predictVCF(VCFGenerator=vcf_gen,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 284, in load_and_predictVCF
jsonFILE = open(network[0],"r")
FileNotFoundError: [Errno 2] No such file or directory: './example_output/networks/model.json'
2025-01-23 12:39:38.412544: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error: no .PREDICT.txt file found. You must run ReLERNN_PREDICT.py prior to running ReLERNN_BSCORRECT.py
I'm not sure if you have seen this error before, but I appreciate any help!
Regards, Nil
hello @NilaBlueshirt - it sounds like you have a python environment issue.
assuming you are working on a linux machine, I recommend the same steps as above. After you have cloned this repo, cd to the directory and then:
# 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test
# 2. confirm pip is pointing to this env
which pip
# 3. use that pip to install everything for this repo
pip install .
# 4. test this installation
cd examples
./example_pipeline.sh
this will definitely install tensorflow and h5py, among other packages and the example_pipeline.sh workflow should run
Hi @andrewkern , Thanks for getting back to me. The errors I described in my first reply, were generated after your step 4. Here are more details:
$ mamba create -n relearnn-1.0.0 -c conda-forge python=3.10 -y
...
$ source activate relearnn-1.0.0
$ which pip
/packages/envs/relearnn-1.0.0/bin/pip
$ which python
/packages/envs/relearnn-1.0.0/bin/python
$ pip install .
...
Building wheels for collected packages: ReLERNN
Building wheel for ReLERNN (setup.py) ... done
Created wheel for ReLERNN: filename=ReLERNN-0.2-py3-none-any.whl size=44449 sha256=9494e619a61fab00d86cc99320707b520aec017da8a7aaa95a9c685169c7e754
Stored in directory: /tmp/pip-ephem-wheel-cache-hnqqww0b/wheels/5a/90/be/ab9f318b7c8a7e520edb7963bf25c02a03f3a447944c7aa6b7
Successfully built ReLERNN
Installing collected packages: zipp, typing-extensions, toolz, threadpoolctl, svgwrite, six, ruamel.yaml.clib, rpds-py, pyyaml, pyparsing, pillow, packaging, numpy, newick, locket, kiwisolver, joblib, fsspec, fonttools, cycler, cloudpickle, click, attrs, scipy, ruamel.yaml, referencing, python-dateutil, partd, importlib_metadata, contourpy, scikit-learn, matplotlib, jsonschema-specifications, demes, dask, jsonschema, tskit, scikit-allel, msprime, ReLERNN
Successfully installed ReLERNN-0.2 attrs-25.1.0 click-8.1.8 cloudpickle-3.1.1 contourpy-1.3.1 cycler-0.12.1 dask-2025.1.0 demes-0.2.3 fonttools-4.55.6 fsspec-2024.12.0 importlib_metadata-8.6.1 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 kiwisolver-1.4.8 locket-1.0.0 matplotlib-3.10.0 msprime-1.3.3 newick-1.9.0 numpy-2.2.2 packaging-24.2 partd-1.4.2 pillow-11.1.0 pyparsing-3.2.1 python-dateutil-2.9.0.post0 pyyaml-6.0.2 referencing-0.36.2 rpds-py-0.22.3 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 scikit-allel-1.3.13 scikit-learn-1.6.1 scipy-1.15.1 six-1.17.0 svgwrite-1.4.3 threadpoolctl-3.5.0 toolz-1.0.0 tskit-0.6.0 typing-extensions-4.12.2 zipp-3.21.0
$ cd examples
$ ./example_pipeline.sh
Traceback (most recent call last):
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_SIMULATE", line 7, in <module>
from ReLERNN.imports import *
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/__init__.py", line 3, in <module>
from ReLERNN.imports import *
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/imports.py", line 12, in <module>
import h5py
ModuleNotFoundError: No module named 'h5py'
That's why I would need to manually installed 'h5py' and so on. My apologies for the confusion, and thanks again for helping us.
Regards, Nil
something isn't going right here, your pip install . call doesn't seem to be reading all the correct requirements from the setup.py. do you have the newest version of relernn from the repo?
Thanks for the quick reply! Yes, I cloned the main branch of the repo last week. I noticed that there is a setup_fix branch, should I be using that one?
ack i think we had a commit that hadn't hit the main branch. please clone the repo and try these same steps again.
The pip install output seems to be good now:
Building wheels for collected packages: ReLERNN
Building wheel for ReLERNN (setup.py) ... done
Created wheel for ReLERNN: filename=ReLERNN-0.2-py3-none-any.whl size=44461 sha256=53aab229bed27a9dc80af6abb3d095583ea33dadd532f4dc48fbb38449f15248
Stored in directory: /tmp/pip-ephem-wheel-cache-albzfg84/wheels/5a/90/be/ab9f318b7c8a7e520edb7963bf25c02a03f3a447944c7aa6b7
Successfully built ReLERNN
Installing collected packages: libclang, flatbuffers, zipp, wrapt, urllib3, typing-extensions, toolz, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, svgwrite, six, ruamel.yaml.clib, rpds-py, pyyaml, pyparsing, pyasn1, protobuf, pillow, packaging, opt-einsum, oauthlib, numpy, newick, MarkupSafe, markdown, locket, kiwisolver, keras, joblib, idna, grpcio, gast, fsspec, fonttools, cycler, cloudpickle, click, charset-normalizer, certifi, cachetools, attrs, absl-py, werkzeug, scipy, ruamel.yaml, rsa, requests, referencing, python-dateutil, pyasn1-modules, partd, ml-dtypes, importlib_metadata, h5py, google-pasta, contourpy, astunparse, scikit-learn, requests-oauthlib, matplotlib, jsonschema-specifications, google-auth, demes, dask, jsonschema, google-auth-oauthlib, tskit, tensorboard, scikit-allel, tensorflow, msprime, ReLERNN
Successfully installed MarkupSafe-3.0.2 ReLERNN-0.2 absl-py-2.1.0 astunparse-1.6.3 attrs-25.1.0 cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 cloudpickle-3.1.1 contourpy-1.3.1 cycler-0.12.1 dask-2025.1.0 demes-0.2.3 flatbuffers-25.1.24 fonttools-4.55.6 fsspec-2024.12.0 gast-0.6.0 google-auth-2.38.0 google-auth-oauthlib-1.2.1 google-pasta-0.2.0 grpcio-1.70.0 h5py-3.12.1 idna-3.10 importlib_metadata-8.6.1 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 keras-2.15.0 kiwisolver-1.4.8 libclang-18.1.1 locket-1.0.0 markdown-3.7 matplotlib-3.10.0 ml-dtypes-0.2.0 msprime-1.3.3 newick-1.9.0 numpy-1.26.4 oauthlib-3.2.2 opt-einsum-3.4.0 packaging-24.2 partd-1.4.2 pillow-11.1.0 protobuf-4.25.6 pyasn1-0.6.1 pyasn1-modules-0.4.1 pyparsing-3.2.1 python-dateutil-2.9.0.post0 pyyaml-6.0.2 referencing-0.36.2 requests-2.32.3 requests-oauthlib-2.0.0 rpds-py-0.22.3 rsa-4.9 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 scikit-allel-1.3.13 scikit-learn-1.6.1 scipy-1.15.1 six-1.17.0 svgwrite-1.4.3 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.0 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.37.1 termcolor-2.5.0 threadpoolctl-3.5.0 toolz-1.0.0 tskit-0.6.0 typing-extensions-4.12.2 urllib3-2.3.0 werkzeug-3.1.3 wrapt-1.14.1 zipp-3.21.0
The machine I'm using has 4 A100s and has cuda 12.5 installed. However when I ran the example_pipeline.sh, the first couple of lines of the output are:
2025-01-27 17:08:55.511310: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:08:56.091919: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-27 17:08:56.092003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-27 17:08:56.210692: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 17:08:56.456773: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:08:56.458393: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-27 17:08:58.746718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2L...
Split chromosome: 2R...
Split chromosome: 3L...
Split chromosome: 3R...
Split chromosome: X...
I'm worried that the tensorflow installed here is the CPU version, not the GPU version. So I made this testing script to test the env:
import tensorflow as tf
# Check TensorFlow version
print("TensorFlow version:", tf.__version__)
# Check if a GPU is available
print("GPU available:", tf.config.list_physical_devices('GPU'))
And when running this script within the relearnn-1.0.0 env I just made, the output has the similar errors as above:
2025-01-27 17:24:50.843216: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:24:50.871853: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-27 17:24:50.871891: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-27 17:24:50.872883: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 17:24:50.877749: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:24:50.877920: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-27 17:24:52.765075: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
TensorFlow version: 2.15.0
2025-01-27 17:24:56.090959: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
GPU available: []
Thanks for your efforts.
this looks like CUDA isn't installed on your system. do you know if it is?
if it isn't you can try to do the following in your relernn env:
python3 -m pip install 'tensorflow[and-cuda]'
# Verify the installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
I thought my cuda 12.5 is working as it shows up in nvidia-smi outputs. But here is what I got for running the commands you suggested (in the same relearnn-1.0.0 env):
$ python3 -m pip install 'tensorflow[and-cuda]'
Installing collected packages: namex, pygments, optree, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-nvcc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, ml-dtypes, mdurl, tensorboard, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, markdown-it-py, rich, nvidia-cusolver-cu12, keras, tensorflow
Attempting uninstall: ml-dtypes
Found existing installation: ml-dtypes 0.2.0
Uninstalling ml-dtypes-0.2.0:
Successfully uninstalled ml-dtypes-0.2.0
Attempting uninstall: tensorboard
Found existing installation: tensorboard 2.15.2
Uninstalling tensorboard-2.15.2:
Successfully uninstalled tensorboard-2.15.2
Attempting uninstall: keras
Found existing installation: keras 2.15.0
Uninstalling keras-2.15.0:
Successfully uninstalled keras-2.15.0
Attempting uninstall: tensorflow
Found existing installation: tensorflow 2.15.0
Uninstalling tensorflow-2.15.0:
Successfully uninstalled tensorflow-2.15.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
relernn 0.2 requires tensorflow==2.15.0, but you have tensorflow 2.18.0 which is incompatible.
Successfully installed keras-3.8.0 markdown-it-py-3.0.0 mdurl-0.1.2 ml-dtypes-0.4.1 namex-0.0.8 nvidia-cublas-cu12-12.5.3.2 nvidia-cuda-cupti-cu12-12.5.82 nvidia-cuda-nvcc-cu12-12.5.82 nvidia-cuda-nvrtc-cu12-12.5.82 nvidia-cuda-runtime-cu12-12.5.82 nvidia-cudnn-cu12-9.3.0.75 nvidia-cufft-cu12-11.2.3.61 nvidia-curand-cu12-10.3.6.82 nvidia-cusolver-cu12-11.6.3.83 nvidia-cusparse-cu12-12.5.1.3 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.5.82 optree-0.14.0 pygments-2.19.1 rich-13.9.4 tensorboard-2.18.0 tensorflow-2.18.0
And then the test says:
$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2025-01-27 18:34:31.317555: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028071.486937 75281 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028071.534654 75281 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:34:31.956131: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Would it be ok if I change the line "tensorflow==2.15.0" into tensorflow[and-cuda]?
And then I re-ran the example_pipeline.py, here is the full output:
2025-01-27 18:46:39.125539: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028799.139504 75619 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028799.143695 75619 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:46:39.159773: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2R...
Split chromosome: 2L...
Split chromosome: 3L...
Split chromosome: 3R...
Split chromosome: X...
Converting ./example_output/splitVCFs/example_2L:0-840000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_2R:0-1669000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_3R:0-1963000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_X:0-1250000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_3L:0-742000.vcf to HDF5...
Reading HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_2R:0-1669000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_3L:0-742000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_3R:0-1963000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_X:0-1250000.hdf5"...
Accessibility mask found: calculating the proportion of the genome that is masked...
1.3% of genome inaccessible
Simulating with window size = 211000 bp.
Training set:
Simulate...
Validation set:
Simulate...
Test set:
Simulate...
SIMULATIONS FINISHED!
SANITY CHECK
====================
numSegSites Min Mean Max
Simulated: 145 998 2498
InputVCF 2L:0-840000: 238 909 1741
InputVCF 2R:0-1669000: 411 1000 1754
InputVCF 3L:0-742000: 143 909 1777
InputVCF 3R:0-1963000: 358 1000 1759
InputVCF X:0-1250000: 127 1000 1720
***ReLERNN_SIMULATE.py FINISHED!***
2025-01-27 18:47:58.679078: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028878.693167 76377 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028878.697448 76377 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:47:58.713708: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1738028882.734427 76377 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory: -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
I0000 00:00:1738028882.897592 76377 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory: -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 2508, 20) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ bidirectional (Bidirectional) │ (None, 168) │ 53,424 │ input_layer[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense) │ (None, 256) │ 43,264 │ bidirectional[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ input_layer_1 (InputLayer) │ (None, 2508) │ 0 │ - │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout (Dropout) │ (None, 256) │ 0 │ dense[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense) │ (None, 256) │ 642,304 │ input_layer_1[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ concatenate (Concatenate) │ (None, 512) │ 0 │ dropout[0][0], │
│ │ │ │ dense_1[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_2 (Dense) │ (None, 64) │ 32,832 │ concatenate[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_1 (Dropout) │ (None, 64) │ 0 │ dense_2[0][0] │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_3 (Dense) │ (None, 1) │ 65 │ dropout_1[0][0] │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
Total params: 771,889 (2.94 MB)
Trainable params: 771,889 (2.94 MB)
Non-trainable params: 0 (0.00 B)
Traceback (most recent call last):
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 130, in <module>
main()
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 109, in main
runModels(ModelFuncPointer=GRU_TUNED84,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 370, in runModels
history = model.fit(TrainGenerator,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
return fn(*args, **kwargs)
TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'use_multiprocessing'
2025-01-27 18:48:07.889282: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028887.902228 76674 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028887.906192 76674 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:48:07.920394: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Importing HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
I0000 00:00:1738028890.473773 76674 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory: -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
Traceback (most recent call last):
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 155, in <module>
main()
File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 122, in main
load_and_predictVCF(VCFGenerator=vcf_gen,
File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 284, in load_and_predictVCF
jsonFILE = open(network[0],"r")
FileNotFoundError: [Errno 2] No such file or directory: './example_output/networks/model.json'
2025-01-27 18:48:13.197334: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028893.211527 76877 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028893.215825 76877 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:48:13.232331: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error: no .PREDICT.txt file found. You must run ReLERNN_PREDICT.py prior to running ReLERNN_BSCORRECT.py
Thanks.
so i'm betting this is because you've now installed a different version of tensorflow when you installed with cuda. what version does it say you have? basically what's happening is the pipeline is looking for a model named './example_output/networks/model.json' -- what files do you see in that directory?
Hi Andrew,
You are right, it's the tensorflow and cuda not playing nice on my side. I resolved the cuda version error and despite the cuda warning messages, the example_pipeline.py runs perfectly fine on a GPU. Thanks again for all your efforts!
Here is what I did in case someone else wants to manage the dependencies with mamba instead of pip:
$ git clone https://github.com/kr-colab/ReLERNN.git
$ cd ReLERNN
$ mamba create -n relearnn-1.0.0 -c conda-forge -c nvidia python=3.10 tensorflow=2.15.0 cuda-toolkit h5py -y
$ pip install .
$ ./example_pipeline.sh
The harmless warnings and errors:
$ ./example_pipeline.sh
2025-02-03 16:55:28.942972: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-03 16:55:28.943023: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-03 16:55:28.944064: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-03 16:55:28.949335: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2R...
...
Nice GPU utilization rate when running epochs:
great to hear