optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Load Time data errors in benchmark_gptq

Open kirayomato opened this issue 1 year ago • 1 comments

System Info

- Platform: Windows-10
- Python version: 3.10.13
- PyTorch version: 2.1.2+cu118
- Huggingface_hub version: 0.20.1
- Transformers version: 4.36.2
- Accelerate version: 0.25.0
- optimum version: 1.16.1
- auto_gptq version: 0.5.1+cu118

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

python benchmark_gptq.py --model TheBloke/llava-v1.5-13B-GPTQ  --num-batches 4 --batch-size 1 --gptq --task text-generation --use-exllama --exllama-version 2 --generate

Expected behavior

I tried to test the model many times, but the load time data I get varies a lot. 2024-01-09_13,59,37_wps_Ofi6 It's a 6.7GB sized model and was placed on a hard drive, so I think 67 seconds of data is correct.

kirayomato avatar Jan 09 '24 06:01 kirayomato

Thank you @kirayomato, I remember noticing large variability there as well. The relevant code is here: https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L306-L329 https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L354-L355

Does something look wrong to you there?

fxmarty avatar Jan 09 '24 09:01 fxmarty