zmtttt

Results 26 comments of zmtttt

> Does it support Muti-Machine and Muti-GPU to use ndtimeline?? Now,I can use single-Machine and Muti-GPU to analyze GPT with the ndtimeline tool, but I wandered does it support Muti-machine??...

> > > Does it support Muti-Machine and Muti-GPU to use ndtimeline?? Now,I can use single-Machine and Muti-GPU to analyze GPT with the ndtimeline tool, but I wandered does it...

> Could you share how you're installing `transformer-engine`? the official method:pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable but Fail to build wheel for transformer-engine

> Could you post the full build log? Building CMake extension transformer_engine Running command /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/bin/cmake -S /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/transformer_engine/common -B /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.8 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/lib.linux-x86_64-cpython-38 -DCMAKE_CUDA_ARCHITECTURES=70;80;89;90 -Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11 -GNinja CMake Error at /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/share/cmake-3.30/Modules/Internal/CMakeCUDAFindToolkit.cmake:104...

> Yep same issue here, We were able to make the pipeline parallelism work for the value 2 on the same node but beyond 2 and in multi nodes settings,...