RenyanDiao issues

Results 3 issues of


                                            RenyanDiao

paddleslim量化敏感度分析使用相似度评估精度损失，存在内存泄漏问题

https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/analysis.py#L416 paddle gpu:2.4.2 post112 paddleslim2.4.1 分析精度损失：评估函数不可使用，利用fp_int_cosine_similarity 内存占用一直上升，直接到最后就是ResourceExhaustedError: Fail to alloc memory of 524288000 size, error code is 12. Sampling stage, Run batch:| | 0/1W0531 14:29:24.876821 1627 sampler.cpp:189] bvar is busy...

Failed to do quantization for models like EleutherAI/gpt-neox-20b and bigscience/bloom-7b1

### Describe the bug MODEL_ID="/models/models--EleutherAI--gpt-neox-20b" mkdir saved_results_gpt_neox python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neox" --jit -m ${MODEL_ID} --int8 MODEL_ID="/models/models--bigscience--bloom-7b1" mkdir saved_results_bloom python run_bloom_int8.py --ipex-weight-only-quantization --output-dir "saved_results_bloom" --jit -m ${MODEL_ID} --int8-bf16-mixed Loading checkpoint...

CPU

Crash

LLM

batch running the sample run_llama_int8.py generates the same repeated contents as the answer of the first question

### Describe the bug python run_llama_int8.py -m ${MODEL_ID} --quantized-model-path "./saved_results/best_model.pt" --benchmark --jit --int8-bf16-mixed prompt = ["Is Apple an American company?", "Hugging Face Company is"] Prompt size: 7 generate texts: ['Is...

CPU

Query

LLM