Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard
合并权重后加载权重极慢
感谢您使用Issue提问模板,请按照以下步骤提供相关信息。我们将优先处理信息相对完整的Issue,感谢您的配合。
提示:将[ ]中填入x,表示打对钩。提问时删除上面这两行。请只保留符合的选项,删掉其他。
详细描述问题
请尽量具体地描述您遇到的问题。这将有助于我们更快速地定位问题所在。
运行截图或log
(如有必要)请提供文本log或者运行截图,以便我们更好地了解问题详情。
必查项目
- [ alpaca] 哪个模型的问题:LLaMA / Alpaca (只保留你要问的)
- [ 模型转换和合并、模型推理] 问题类型:(只保留你要问的)
- 下载问题
- 模型转换和合并
- 模型推理问题(🤗 transformers)
- 模型量化和部署问题(llama.cpp、text-generation-webui、LlamaChat)
- 效果问题
- 其他问题
- [ ] 由于相关依赖频繁更新,请确保按照Wiki中的相关步骤执行
- [ ] 我已阅读FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案
- [ ] 第三方插件问题:例如llama.cpp、text-generation-webui、LlamaChat等,同时建议到对应的项目中查找解决方案
取决于模型权重的大小
I found that it will be slow if first load. But for me, it will finish within minutes. Or you can check if your cuda environment works as normal.
BTW, does anybody know why the finetuned weights to save take about double file size of the merged weights? I used stanford-alpaca to train with deepspeed. Not sure if it was due to incompatibility between transformers and deepspeed.
I found that it will be slow if first load. But for me, it will finish within minutes. Or you can check if your cuda environment works as normal.
BTW, does anybody know why the finetuned weights to save take about double file size of the merged weights? I used stanford-alpaca to train with deepspeed. Not sure if it was due to incompatibility between transformers and deepspeed.
Did you save the model in float16 or float32? The merged weights and the released lora weights are stored in float16
I found that it will be slow if first load. But for me, it will finish within minutes. Or you can check if your cuda environment works as normal. BTW, does anybody know why the finetuned weights to save take about double file size of the merged weights? I used stanford-alpaca to train with deepspeed. Not sure if it was due to incompatibility between transformers and deepspeed.
Did you save the model in float16 or float32? The merged weights and the released lora weights are stored in float16
Sorry. Just noticed that saved the wrong precision. But I have another question, why the model size in GPU memory of two different precision(float32 and float16) all consumes same 26GB of GPU memory?
Sorry to bother. The transformers will load model in float32 in default. Users have to set load type when loading or do half() to obtain a float16 model(in this way we can manually torch.cuda.empty_cache() to see real gpu mem usage).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.