bluefog icon indicating copy to clipboard operation
bluefog copied to clipboard

ImportError: /root/miniconda3/envs/bluefog/lib/python3.8/site-packages/bluefog/torch/mpi_lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv

Open yangxuanfei opened this issue 3 years ago • 3 comments

Running the tutorial times the above errors. What is the reason

yangxuanfei avatar May 13 '22 15:05 yangxuanfei

Hi, can you post the environment settings? That error probably means at::Tensor::device can be found in the symbol. at is ATen library in the PyTorch library. So I guess it might be related to your PyTorch version or when building the BlueFog library, it failed to link the PyTorch symbols. It will be helpful if you post how you install the BlueFog Library.

BichengYing avatar May 13 '22 19:05 BichengYing

I use anaconda. The python version is 3.8, and then the pytorch version is 1.8. I download the installation package from the official website and install it locally. The installation package is torch-1.8.1 + cu102-cp38-cp38-linux_ x86_ 64.whl。 Is it the problem with this installation package? Has nothing to do with my openmpi?

yangxuanfei avatar May 14 '22 06:05 yangxuanfei

  1. that should not be related to openmpi because it failed to link the symbol (that is in C++ side since our backend depends on the PyTorch).
  2. I don't have a good idea why it cannot find the symbol. (first time saw this) But I would suggest downgrading torch to 1.5?
  3. I found this similar issue https://github.com/aim-uofa/AdelaiDet/issues/181 Try to build it through the github maybe checkout this page https://github.com/Bluefog-Lib/bluefog/wiki/BlueFog-Development-Guide

BichengYing avatar May 17 '22 06:05 BichengYing