Bill Xu comments

Results 4 comments of


                                            Bill Xu

trafficstars

[BUG] offloading section in config file never carried to autotuner

Thank you for responding to my request. I have done `pip install git+https://github.com/huggingface/transformers` and `pip install git+https://github.com/huggingface/accelerate`, as confirmed by [pip_freeze](https://github.com/cxxz/llama_deepspeed_autotune/blob/main/zero2_autotuner_logs/pip_freeze_all.txt). However, upon rerunning [run_autotune_llama_4A100.sh](https://github.com/cxxz/llama_deepspeed_autotune/blob/main/run_autotune_llama_4A100.sh), the `offload` section still failed...

[BUG] it took almost 1hour to :Initializing TorchBackend in DeepSpeed with backend nccl

The server I'm using is unable to connect to the internet. Can I configure DeepSpeed to prevent the offline environment from slowing down the NCCL initialization process?

Slow inference performance for large Llama models compared to naive MP

I also observed slowdown with tensor_parallel 1.2.1 compared to native performance on single GPU. ### Setup Llama-7b on 8 x A100 80GB (NVLink) ### Prompt > "Count up from 100...

Slow inference performance for large Llama models compared to naive MP

Thank you for sharing your findings on the performance of LLaMA 13B on Kaggle 2x T4. Good to know that you've identified the .generate() issue. I appreciate your efforts in...