ImportError: /root/miniconda3/envs/bluefog/lib/python3.8/site-packages/bluefog/torch/mpi_lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv
Running the tutorial times the above errors. What is the reason
Hi, can you post the environment settings? That error probably means at::Tensor::device can be found in the symbol. at is ATen library in the PyTorch library. So I guess it might be related to your PyTorch version or when building the BlueFog library, it failed to link the PyTorch symbols. It will be helpful if you post how you install the BlueFog Library.
I use anaconda. The python version is 3.8, and then the pytorch version is 1.8. I download the installation package from the official website and install it locally. The installation package is torch-1.8.1 + cu102-cp38-cp38-linux_ x86_ 64.whl。 Is it the problem with this installation package? Has nothing to do with my openmpi?
- that should not be related to openmpi because it failed to link the symbol (that is in C++ side since our backend depends on the PyTorch).
- I don't have a good idea why it cannot find the symbol. (first time saw this) But I would suggest downgrading torch to 1.5?
- I found this similar issue https://github.com/aim-uofa/AdelaiDet/issues/181 Try to build it through the github maybe checkout this page https://github.com/Bluefog-Lib/bluefog/wiki/BlueFog-Development-Guide