BH-L

Results 3 issues of BH-L

如何使用该模型进行预测和重构

在微调过的llama2模型上能够完成转换和量化,但在运行模型的时候会报错 ``` # ./llama -m ../llama2-13b-sft-filterd-v17/llama2-13b-sft-filterd-v17-inferllm-fp32.bin -g GPU --version 2 main: seed = 1709878763 total vocab length = 68419 weight tok_embeddings.weight is not match. Assert ' weight->length() == nr_number '...

在replace-model阶段会OOM可能是什么问题呢 模型是QWen32B,用的是7张A6000 nvidia-smi显示启动了7张卡,但是模型好像只放在gpu0 ![Image](https://github.com/user-attachments/assets/b59fad31-8890-43ea-869f-f809f7985c66) 配置文件: ``` base: seed: &seed 42 model: type: Qwen2 path: ./DeepSeek-R1-Distill-Qwen-32B tokenizer_mode: slow torch_dtype: auto # calib: # name: pileval # download: True # path: ./LLMCompress/data...