Perkz Zheng comments

Results 84 comments of


                                            Perkz Zheng

[BUG] UCX issue for Multi-GPU criteo/DLRM

I have tried the original DLRM criteo mutli-gpu scrip in /examples, it is still leading to UCX errors when setting -p "ucx".

Cannot do inference for any model on more than two nodes

Hi, @duli2012 Can you make sure you have the same ENV setting on all nodes? you can do `NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=ENV` to generate the logs when you run the gpt_example, and...

LLama model does not work on multi-gpu

Can you add NCCL_DEBUG=INFO, and see if we can get more detailed logs ?

LLama model does not work on multi-gpu

> Can you tell me which version of torch, nccl, cuda, and cudnn I should use to check the operation of the main branch? it is recommended to use the...

LLama model does not work on multi-gpu

@khj94 Can you try NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,ENV,COLL,NET for a more detailed log ? thanks.

LLama model does not work on multi-gpu

> I am experimenting with smoothquant, and an error occurred during checkpoint conversion with the [command](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#smoothquant). > > When I download the llama2-7b model from huggingface and convert the checkpoint...

Perkz Zheng

[BUG] UCX issue for Multi-GPU criteo/DLRM

Cannot do inference for any model on more than two nodes

LLama model does not work on multi-gpu

LLama model does not work on multi-gpu

LLama model does not work on multi-gpu

LLama model does not work on multi-gpu

LLama model does not work on multi-gpu

LLama model does not work on multi-gpu

NCCL errors while running LLAMA2 70b benchmark shmoo with batch size=128 and input length=2048 on 4 H100 GPUs

NCCL errors while running LLAMA2 70b benchmark shmoo with batch size=128 and input length=2048 on 4 H100 GPUs