deeptime icon indicating copy to clipboard operation
deeptime copied to clipboard

Installation runs into CUDA problem

Open cap-jmk opened this issue 3 years ago • 12 comments

The bug reported at https://github.com/rusty1s/pytorch_sparse/issues/180 and https://github.com/pyg-team/pytorch_geometric/issues/4095 propagates to deeptime, too. Fixing PyTorch to older versions in the setup might help.

cap-jmk avatar Feb 18 '22 10:02 cap-jmk

This seems to be a problem related to incompatible pytorch and pytorch_sparse versions, here we only depend on pytorch and that only weakly; there is no explicit dependency. In that sense I am a bit uncomfortable fixing a version in the setup as it would introduce a hard link.

clonker avatar Feb 18 '22 10:02 clonker

I experienced the error while installing deeptime in an isolated conda environment on the newest Ubuntu. As pip was pulling the default PyTorch, the error occured for plain PyTorch, too. The error occurs also on Colab when using PyTorch. From what I know, the error does not depend on a Python package but rather on CUDA compilation and is thus independent of a specific Python package. Anyhow, one can't use deeptime in that case and I thus recommend fixing the error.

cap-jmk avatar Feb 19 '22 09:02 cap-jmk

That is very odd, deeptime is not supposed to pull pytorch at all. Can you try again in an isolated environment and paste the output here? If you have a look here you can see that pytorch is only an "extras" dependency, so a mere pip install deeptime shouldn't pull it. Here is what you can run to check the installed dependencies of a pip package (example output for a test installation of mine):

~  pip show deeptime
Name: deeptime
Version: 0.4.1
[...]
Requires: numpy, scikit-learn, scipy, threadpoolctl
Required-by:

Please let me know what you find, thanks!

clonker avatar Feb 19 '22 20:02 clonker

However, for deeptime it is required to install torch. Maybe you can try reproducing the error with pulling default torch on a CUDA machine with CUDA 11.1. While the bug is present, I think the user will wonder why they can't use the full functionality of deeptime, or why the import of deeptime fails at all. Maybe the user will think the library is faulty and skip using it.

cap-jmk avatar Feb 21 '22 08:02 cap-jmk

Ah now I see what you mean - I think it's a good idea to catch such an import error. :slightly_smiling_face: Fixing the version in the setup doesn't seem very sensible to me though, as we do not depend on pytorch. Here is what happens: deeptime checks if pytorch is installed and if so, imports certain deep learning submodules. I will add a check if torch could successfully imported rather than just checking whether the namespace is available.

clonker avatar Feb 21 '22 09:02 clonker

Great fix 🚀
Under the hood of the torch bug, I realized another, similar bug, too. When installing from pip, the package does not always have the right c++ compilation in the numerical module. Installing from conda works, though. The bug looks similar, to the other one.

undefined symbol:  _ZNSt15__exception_ptr13exception_ptr10_M_releaseE

Ref: https://github.com/pybind/pybind11/issues/3623

I am not sure if it is worth fixing at all. Just wanted to report in case there is some inconsistency in the distributions.

cap-jmk avatar Feb 21 '22 10:02 cap-jmk

Ah thank you for bringing it to my attention! That is one of the drawbacks of using a sdist over a binary distribution with pip. On the other hand I do like that it is compiled locally. Basically a toolchain setup problem... not sure how one would even go about fixing that. Aside from using a binary distribution of course :)

clonker avatar Feb 21 '22 10:02 clonker

From the user perspective, I think it is whatever floats the boat. When building packages that have deeptime as dependency, it would be useful to be able to reliably pull it from pip. Otherwise, the distribution for the new package via PyPi is somewhat having the same problem, and the bug would propagate forever… If the faulty behaviour is present, one could also redirect the user to the conda build or provide additional instructions. Maybe a simple test during the setup procedure helps to decide what to do. What do you think?

Conda is not an option in each environment.

cap-jmk avatar Feb 21 '22 11:02 cap-jmk

Hey @MQSchleich, I've been experimenting a bit with CMake as dominant build system, I'd imagine it is a bit more robust with respect to incompatible toolchains. Also the initial pytorch issue should be fixed on the brach of PR #215 - if you'd like and have some time I'd appreciate if you can try it out and see if the problem persists.

clonker avatar Mar 02 '22 20:03 clonker

@clonker, did you upload it to PyPi, yet? I tried it out on the problematic machine, and it did indeed persist...

cap-jmk avatar Mar 07 '22 15:03 cap-jmk

No it's not on pypi yet, you'll have to run the setup from the remote:

pip install git+https://github.com/deeptime-ml/deeptime.git@main

clonker avatar Mar 07 '22 18:03 clonker

Ping on this one, with the new version it should also work via pip install deeptime.

clonker avatar Apr 12 '22 07:04 clonker

I assume this is either no longer an issue or abandoned, please feel free to reopen otherwise. :)

clonker avatar Aug 18 '22 08:08 clonker