eccct

Results 9 comments of eccct

再次运行benchmark,bash gemini.sh后系统长时间停顿在编译阶段。 ![bash gemini sh](https://github.com/user-attachments/assets/d79d1c54-5bf3-430d-954f-c9d45a379b3c) 使用colossalai check -i这个命令来检查目前环境里的版本兼容性以及CUDA Extension的状态。 ![colossalai check again](https://github.com/user-attachments/assets/483f7286-5aeb-40fb-b7b9-c35d5659788f) 请帮忙分析一下原因,谢谢!

I tried to use BUILD_EXT=1 pip install . and it failed to build, please check the log files I uploaded. Thanks! [pip install -r requirementsrequirements.txt](https://github.com/user-attachments/files/17091768/pip.install.-r.requirementsrequirements.txt) [BUILD_EXT=1 pip install.txt](https://github.com/user-attachments/files/17091770/BUILD_EXT.1.pip.install.txt)

Yesterday I ran " instead of "CUDA_EXT=1 pip install .", it build successfully. Then I ran benchmark with "bash gemini.sh", it took long time without responding. I updated the ticket...

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $CUDA_HOME /usr/local/cuda-12.1 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $PATH /root/miniconda3/envs/colo01/bin:/usr/local/cuda-12.1/bin:/root/miniconda3/envs/colo01/bin:/root/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0:/mnt/c/WINDOWS/System32/OpenSSH:/mnt/c/Program Files/PuTTY:/mnt/c/Program Files/Git/cmd:/mnt/c/Users/eccct/anaconda3/Scripts:/mnt/c/Program Files/OpenSSH-Win64:/mnt/c/Program Files/nodejs:/mnt/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/eccct/AppData/Local/Microsoft/WindowsApps:/mnt/d/A:/mnt/c/Users/eccct/Anaconda3:/mnt/c/Users/eccct/Anaconda3/Library/mingw-w64/bin:/mnt/c/Users/eccct/Anaconda3/Library/usr/bin:/mnt/c/Users/eccct/Anaconda3/Library/bin:/mnt/c/Users/eccct/Anaconda3/Scripts:/mnt/c/Users/eccct/AppData/Roaming/Aria2/maria2c.exe:/mnt/d/Microsoft VS Code/bin:/mnt/c/Users/eccct/AppData/Roaming/npm:/mnt/c/Users/eccct/.dotnet/tools:/mnt/c/Users/eccct/AppData/Local/Programs/Azure...

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: is already registered as pytree node. Overwriting the previous registration. warnings.warn( /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45:...

(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/pipeline/schedule/_utils.py:19: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/envs/colo01/lib/python3.10/site-packages/torch/utils/_pytree.py:332: UserWarning: is already registered as pytree node. Overwriting the previous registration. warnings.warn( /root/miniconda3/envs/colo01/lib/python3.10/site-packages/colossalai-0.4.4-py3.10.egg/colossalai/shardformer/layer/normalization.py:45:...

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version gcc (Ubuntu 12.3.0-17ubuntu1) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty;...

@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# sudo update-alternatives --config gcc There are 3 choices for the alternative gcc (providing /usr/bin/gcc). Selection Path Priority Status ------------------------------------------------------------ 0 /usr/bin/gcc-12 60 auto mode 1 /usr/bin/gcc-10 60...

@flybird11111 I changed benchmark.py degrade model to 1b, num_hidden_layers=1 and set cache to 256, As it shows that "Model params: 615.01 M", in which condition cause out of memroy? colossalai...