RenyanDiao
RenyanDiao
https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/analysis.py#L416 paddle gpu:2.4.2 post112 paddleslim2.4.1 分析精度损失:评估函数不可使用,利用fp_int_cosine_similarity 内存占用一直上升,直接到最后就是ResourceExhaustedError: Fail to alloc memory of 524288000 size, error code is 12. Sampling stage, Run batch:| | 0/1W0531 14:29:24.876821 1627 sampler.cpp:189] bvar is busy...
### Describe the bug MODEL_ID="/models/models--EleutherAI--gpt-neox-20b" mkdir saved_results_gpt_neox python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neox" --jit -m ${MODEL_ID} --int8 MODEL_ID="/models/models--bigscience--bloom-7b1" mkdir saved_results_bloom python run_bloom_int8.py --ipex-weight-only-quantization --output-dir "saved_results_bloom" --jit -m ${MODEL_ID} --int8-bf16-mixed Loading checkpoint...
### Describe the bug python run_llama_int8.py -m ${MODEL_ID} --quantized-model-path "./saved_results/best_model.pt" --benchmark --jit --int8-bf16-mixed prompt = ["Is Apple an American company?", "Hugging Face Company is"] Prompt size: 7 generate texts: ['Is...