sleap icon indicating copy to clipboard operation
sleap copied to clipboard

Google Colab: sleap and scipy issues

Open ajuavinett opened this issue 11 months ago • 8 comments

Bug description

Colab cannot install sleap, this appears to be an issue with scipy, which will also not install in a Colab environment.

Expected behaviour

After running the following in Colab...

!pip uninstall -y opencv-python opencv-contrib-python
!pip install "sleap[pypi]>=1.3.3"

... I expected sleap to be installed and usable! :) Similarly, after running !pip install scipy==1.9.0 (or other versions) I expect scipy to be installed.

Actual behaviour

After trying sleap install, I receive the following message (after other dependencies successfully install as needed):

Collecting scipy<=1.9.0,>=1.4.1 (from sleap[pypi]>=1.3.3)
  Using cached scipy-1.9.0.tar.gz (42.0 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (pyproject.toml) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

After trying scipy install, I receive a similar message.

Your personal set up

This is in Colab, which currently ships with Python 3.11.11.

How to reproduce

Run

!pip uninstall -y opencv-python opencv-contrib-python
!pip install "sleap[pypi]>=1.3.3"

or

!pip uninstall -y opencv-python opencv-contrib-python
!pip install "sleap[pypi]>=1.4.1"

or

!pip install scipy==1.9.0

in a new Colab notebook.

ajuavinett avatar Jan 29 '25 04:01 ajuavinett

Hi @ajuavinett!

Colab recently updated their python to 3.11 which requires scipy > 1.9. However, for our package we use python 3.7.12 which supports scipy <= 1.9 (we pinned the version to resolve dependency conflicts). We could get past the scipy dependency however, there would again be a conflict with tensorflow with the latest python version. You could try creating a virtual env in colab with python v3.7.12 (it's a bit tricky but might work)

Thanks,

Divya

gitttt-1234 avatar Jan 31 '25 18:01 gitttt-1234

Thank you! Okay, so I was able to install a virtual environment for Python 3.7 in Colab using the following:

# Install venv if it's not already installed
!sudo apt-get update -y
!sudo apt-get install python3.7-venv 

# Create a virtual environment named 'my_sleap_env'
!python3.7 -m venv my_sleap_env 

# Activate the virtual environment
!source my_sleap_env/bin/activate 

# Install sleap
!pip install sleap[pypi]==1.4.1

This works for getting past issue with scipy and many other packages install just fine. However, now I'm receiving a error: legacy-install-failure with python-rapidjson. I'm wondering if you could think of any way around this?

ajuavinett avatar Feb 04 '25 21:02 ajuavinett

Just wanted to share a solution that's worked for me:

# python version at start-up
!python --version

# downgrade to python 3.7 for compatibility
## type 3 when prompted
!apt-get install python3.7
!apt-get update -y
!update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
!update-alternatives --config python3
!apt install python3-pip
!apt install python3.7-distutils

# current python version - should be 3.7
!python --version

# install old rapidjson version for compatibility
!pip install "python-rapidjson<=1.10"

# install sleap
!pip uninstall -qqq -y opencv-python opencv-contrib-python
!pip install -qqq "sleap[pypi]>=1.4.1"

jocateme avatar Mar 07 '25 12:03 jocateme

Thank you so much, @jocateme -- this is great and does seem to work, yay. Leaving the issue open for the sleap team because they may want to modify the Colab notebooks that launch from sleap to include this fix.

ajuavinett avatar Mar 13 '25 19:03 ajuavinett

Was watching this because I also ran into this problem back in January. As far as I can tell (someone else correct me if I'm wrong), this fix doesn't allow training on GPU. It seems there are some compatibility issues with the python and tensorflow versions, and the gpu drivers and cuda version in the updated colab. Several few weeks ago I spent a weekend trying to roll back to previous compatible versions of all the cuda, cudnn, etc, but wasn't able to get it working after much trial and error. 🙃

Info printed when training: 2025-03-25 02:00:26.853364: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:26.853399: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. INFO:matplotlib.font_manager:generated new fontManager INFO:sleap.nn.training:Versions: SLEAP: 1.4.1 TensorFlow: 2.8.4 Numpy: 1.21.6 Python: 3.7.17 OS: Linux-6.1.85+-x86_64-with-Ubuntu-22.04-jammy . . . INFO:sleap.nn.training: 2025-03-25 02:00:31.309258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2025-03-25 02:00:31.309647: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:31.309878: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:31.310061: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:31.310224: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:32.138608: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:32.138915: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2025-03-25 02:00:32.138950: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... INFO:sleap.nn.training:Running in CPU-only mode. INFO:sleap.nn.training:System: GPUs: None detected. INFO:sleap.nn.training:

caylamiller avatar Mar 25 '25 02:03 caylamiller

I ran into this issue as well and have created a short-term fix that allows you to download sleap and run on the GPU.

# override and install python 3.10 ! wget -O mini.sh https://repo.anaconda.com/miniconda/Miniconda3-py310_22.11.1-1-Linux-x86_64.sh ! chmod +x mini.sh ! bash ./mini.sh -b -f -p /usr/local ! conda install -q -y jupyter ! conda install -q -y google-colab -c conda-forge ! python -m ipykernel install --name "py310" --user

# confirm using python 3.10 ! python3 --version

# install sleap and dependencies ! pip install sleap[pypi]

# install additional sleap dependencies and load GPU support ! pip install matplotlib-inline ! pip install ipython ! apt-get install cuda-11-8 ! apt-get install -y libcudnn8=8.6.0.163-1+cuda11.8 ! pip install numpy==1.23

! export PATH=/usr/local/cuda-11.8/bin:$PATH ! export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH ! export CUDNN_INCLUDE_DIR=/usr/local/cuda/include ! export CUDNN_LIB_DIR=/usr/local/cuda/lib64

You may need to restart the training session after the last command to ensure the changes take effect. This has been working for me.

UPDATE: I published a colab script that can successfully run sleap training using the GPU: https://colab.research.google.com/github/deliacurran/condron-lab-sleap-analysis/blob/main/sleap_training_and_inference_guide.ipynb. There have been a ton of issues with this new release.

deliacurran avatar Apr 01 '25 21:04 deliacurran

I tried out your solution via script for training using the GPU @deliacurran. I can't get it to run, because I get numpy compatability issues for numpy==1.23. Am I doing something wrong?

cemil-du avatar Apr 24 '25 10:04 cemil-du

Hi! @deliacurran, I just wanted to say that your CoLab notebook worked for me! Thank you so much for sharing!

Kagiro-K avatar May 21 '25 23:05 Kagiro-K