optimum Load Time data errors in benchmark

Load Time data errors in benchmark_gptq

Open kirayomato opened this issue 1 year ago • 1 comments

System Info

- Platform: Windows-10
- Python version: 3.10.13
- PyTorch version: 2.1.2+cu118
- Huggingface_hub version: 0.20.1
- Transformers version: 4.36.2
- Accelerate version: 0.25.0
- optimum version: 1.16.1
- auto_gptq version: 0.5.1+cu118

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

python benchmark_gptq.py --model TheBloke/llava-v1.5-13B-GPTQ  --num-batches 4 --batch-size 1 --gptq --task text-generation --use-exllama --exllama-version 2 --generate

Expected behavior

I tried to test the model many times, but the load time data I get varies a lot. 2024-01-09_13,59,37_wps_Ofi6 It's a 6.7GB sized model and was placed on a hard drive, so I think 67 seconds of data is correct.

Jan 09 '24 06:01 kirayomato

Thank you @kirayomato, I remember noticing large variability there as well. The relevant code is here: https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L306-L329 https://github.com/huggingface/optimum/blob/0f0a66303425b476bd5e209c076419a404238bb3/tests/benchmark/benchmark_gptq.py#L354-L355

Does something look wrong to you there?

Jan 09 '24 09:01 fxmarty

optimum optimum copied to clipboard

Load Time data errors in benchmark_gptq

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

optimum
optimum copied to clipboard