jittor icon indicating copy to clipboard operation
jittor copied to clipboard

[import jittor will try to download NCCL even NCCL is loaded]

Open baibizhe opened this issue 2 years ago • 1 comments

Describe the bug

jittor维护者你好 我们试图在超算集群上安装jittor,解决了一些链接问题后,最后还是有一个关于NCCL的问题, 就是我们的gpu节点或者计算节点是没有网络权限的,但是jittor坚持要下载NCCL 即使NCCL已经被load进来了,有没有什么办法在import jittor 的时候 不下载NCCL呢

Hello, jittor maintainer. We tried to install jittor on the supercomputing cluster. After solving some link problems, there was still a problem about NCCL. That is, our gpu node or computing node did not have network permissions, but jittor insisted on downloading. NCCL Even if NCCL has been loaded, is there any way to not download NCCL when importing jittor?

Full Log

This is the email from administrator from computer system " This is an interesting code. I got it to install using the following commands.

  1. Load the required modules.

$ module load gcc $ module load python/3.10 $ module load opencv $ module load cuda $ module load imkl $ module load nccl

  1. Create a virtual environment, and activate.

$ virtualenv venv $ source venv/bin/activate (venv)$

  1. Get the code, and install the needed Python packages. Install JNerF.

(venv)$ git clone https://github.com/Jittor/JNeRF (venv)$ cd JNeRF (venv)$ (venv)$ PYTHONPATH= pip install open3d (venv)$ pip install -r requirements.txt (venv)$ pip install -e . (venv)$

  1. This code has issues. Or rather the jittor package does not appear to be well crafted, and doesn't know how to find things that are in non-standard locations. Let's put in some links to help it along.

(venv)$ cd ../venv/bin (venv)$ (venv)$ ln -s $EBROOTPYTHON/bin/python3.10-config (venv)$ (venv)$ .. (venv)$ ln -s $EBROOTPYTHON/include (venv)$

You should now be able to import it. However, it appears that it can only be imported if a GPU is available. If you go onto a compute node that has a GPU you run into another problem, as jittor insists upon downloading and trying to install NCCL, even though the NCCL module is loaded. Given that compute nodes don't have internet access this is going to fail. You'll need to figure out how to convince jittor to not install NCCL.

"

jittor维护者你好 我们试图在超算集群上安装jittor,解决了一些链接问题后,最后还是有一个关于NCCL的问题, 就是我们的gpu节点或者计算节点是没有网络权限的,但是jittor坚持要下载NCCL 即使NCCL已经被load进来了,有没有什么办法在import jittor 的时候 不下载NCCL呢

Hello, jittor maintainer. We tried to install jittor on the supercomputing cluster. After solving some link problems, there was still a problem about NCCL. That is, our gpu node or computing node did not have network permissions, but jittor insisted on downloading. NCCL Even if NCCL has been loaded, is there any way to not download NCCL when importing jittor?

If you are submitting an issue for the first time, please refer to our guideline

baibizhe avatar Sep 26 '23 00:09 baibizhe

我的是jittor1.3.9.14,源码中看到有以下代码是用来检测nccl的,可能是没检测到:

if not use_nccl: return
nccl_include_path = os.environ.get("nccl_include_path")
nccl_lib_path = os.environ.get("nccl_lib_path")

同样服务器手动无网络安装nccl,检查.bashrc和.profile,发现没有jittor所需的环境变量,添加环境变量即可让jittor检测到服务器已安装的nccl跳过下载步骤:

export nccl_include_path=nccl-master/build/include
export nccl_lib_path=nccl-master/build/lib

MilknoCandy avatar May 05 '25 07:05 MilknoCandy