GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
Errors encountered when running benchmark FP16 baseline on multiple GPUs
Trying to run FP16 baseline benchmark for LLaMA 30B model on a server with 8 V100 32GB GPUs:
CUDA_VISIBLE_DEVICES=0,1 python llama.py /dev/shm/ly/models/hf_converted_llama/30B/ wikitext2 --benchmark 2048 --check
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:49<00:00, 7.09s/it]
Using the latest cached version of the module from /home/ly/.cache/huggingface/modules/datasets_modules/datasets/wikitext/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 (last modified on Tue Apr 11 15:29:08 2023) since it couldn't be found locally at wikitext., or remotely on the Hugging Face Hub.
Found cached dataset wikitext (/home/ly/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Using the latest cached version of the module from /home/ly/.cache/huggingface/modules/datasets_modules/datasets/wikitext/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 (last modified on Tue Apr 11 15:29:08 2023) since it couldn't be found locally at wikitext., or remotely on the Hugging Face Hub.
Found cached dataset wikitext (/home/ly/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Benchmarking ...
Traceback (most recent call last):
File "/home/ly/GPTQ-for-LLaMa/llama.py", line 492, in
accelerate nailed it:
Still not working. Replace codes as mentioned, I got the following error: