optimum
optimum copied to clipboard
Load Time data errors in benchmark_gptq
System Info
- Platform: Windows-10
- Python version: 3.10.13
- PyTorch version: 2.1.2+cu118
- Huggingface_hub version: 0.20.1
- Transformers version: 4.36.2
- Accelerate version: 0.25.0
- optimum version: 1.16.1
- auto_gptq version: 0.5.1+cu118
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
python benchmark_gptq.py --model TheBloke/llava-v1.5-13B-GPTQ --num-batches 4 --batch-size 1 --gptq --task text-generation --use-exllama --exllama-version 2 --generate
Expected behavior
I tried to test the model many times, but the load time data I get varies a lot.
It's a 6.7GB sized model and was placed on a hard drive, so I think 67 seconds of data is correct.
Thank you @kirayomato, I remember noticing large variability there as well. The relevant code is here: https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L306-L329 https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L354-L355
Does something look wrong to you there?