spacetimeformer icon indicating copy to clipboard operation
spacetimeformer copied to clipboard

OSError: libtorch_global_deps.so: cannot open shared object file: No such file or directory

Open mateibejan1 opened this issue 2 years ago • 2 comments

After creating the environment and running the script provided in the Example Spacetimeformer Training Commands section of the repo, I get the following stack trace:

Traceback (most recent call last):
File "train.py", line 7, in <module>
import pytorch_lightning as pl
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
from pytorch_lightning.callbacks import Callback  # noqa: E402
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
from pytorch_lightning.callbacks.base import Callback
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/base.py", line 21, in <module>
import torch
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 196, in <module>
_load_global_deps()
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 149, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/ctypes/__init__.py", line 369, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory

The same stack trace is replicable by simply importing torch in a .py script.

I've checked /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/ and there is no libtorch_global_deps.so file. Do I have to pull it from somewhere or install some other torch library?

I'm running this code on Ubuntu 20.04, python3.8 torch1.9.0 and cuda 10.2.

mateibejan1 avatar May 17 '22 17:05 mateibejan1

Hi, this looks like a generic PyTorch installation error that isn't related to the spacetimeformer code or PyTorch lightning (which is a third-party library that basically handles training loop boilerplate). Installing PyTorch can be surprisingly tricky at times, especially with cuda version conflicts and so on. I recommend making a new environment and installing the latest version of PyTorch (1.11), which has been tested with the latest version of this repo. I've had to set up PyTorch on a bunch of different servers and in my experience it's usually easier to fix cuda compatibility issues by installing with conda rather than pip. https://pytorch.org/get-started/locally/

You'll know it worked if you can do

import torch
torch.cuda.is_available()

and get "True"

jakegrigsby avatar May 17 '22 23:05 jakegrigsby

Thanks for reaching back!

I've recreated the environment with python 3.8, torch 1.11.0, cuda 10.2 and installed the requirements via pip. However, the error still persists.

mateibejan1 avatar May 18 '22 07:05 mateibejan1