sunjunlishi

Results 124 comments of sunjunlishi

要是提供一个小模型,效果还好,要掀起一个雪浪,在嵌入式人脸识别上;

> @fmmoret I can confirm that I was able to run a quantization model + adapters using your branch, and the results was good. which branch?

hi,这个耗时呀。使用双卡可以提升一倍的速度呀。怎么用呀,loss的时候 那个问题没法解决。怎么解决呀

简单的尝试解决,更改维度: preds = model(inp).cpu() ps = preds.size() new_shape = ((int)(ps[0]/2), ps[1]*2, ps[2]) preds = preds.view(*new_shape)

@chichuhu 具体代码怎么改?请教

the speed is 80ms with the 4.jpg, and 4 faces. clock时间 不要除以,你除以之后导致时间错了10倍;

LLaMA-Factory-main/src/llmtuner/model/adapter.py 这里的代码把微调模型和基础模型统一了起来,好神奇呀。 if adapter_to_resume is not None: # resume lora training print('to resume....') model = PeftModel.from_pretrained(model, adapter_to_resume, is_trainable=is_trainable)

@hiyouga 为啥用量化模型训练呀。因为大模型的量化模型用来训练,精度高,loss值降的快,并且占用显存少。 现在整个流程都是通的,训练,执行demo。唯一的就是速度稍慢。 14B-4bit量化比7B非量化用来训练,效果更好。

### **can vllm support loading lora?** https://github.com/vllm-project/vllm/issues/2710 [jvmncs](https://github.com/jvmncs) commented [on Feb 6](https://github.com/vllm-project/vllm/issues/2710#issuecomment-1928128035) Have a look at this example: https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py @simon-mo Collaborator [simon-mo](https://github.com/simon-mo) commented [2 weeks ago](https://github.com/vllm-project/vllm/issues/2710#issuecomment-1967691477) And documentation here: https://docs.vllm.ai/en/latest/models/lora.html...