Bill Xu
Bill Xu
Thank you for responding to my request. I have done `pip install git+https://github.com/huggingface/transformers` and `pip install git+https://github.com/huggingface/accelerate`, as confirmed by [pip_freeze](https://github.com/cxxz/llama_deepspeed_autotune/blob/main/zero2_autotuner_logs/pip_freeze_all.txt). However, upon rerunning [run_autotune_llama_4A100.sh](https://github.com/cxxz/llama_deepspeed_autotune/blob/main/run_autotune_llama_4A100.sh), the `offload` section still failed...
The server I'm using is unable to connect to the internet. Can I configure DeepSpeed to prevent the offline environment from slowing down the NCCL initialization process?
I also observed slowdown with tensor_parallel 1.2.1 compared to native performance on single GPU. ### Setup Llama-7b on 8 x A100 80GB (NVLink) ### Prompt > "Count up from 100...
Thank you for sharing your findings on the performance of LLaMA 13B on Kaggle 2x T4. Good to know that you've identified the .generate() issue. I appreciate your efforts in...