Gzj369 comments

Results 6 comments of


                                            Gzj369

[Question]: How to create indexes using local LLM and embedding from a local server path

> @Gzj369 dosu's reco on using OpenLLM seems like a good one. Does that at least cover the LLM aspect of your use case? For embeddings, would you be able...

什么时候有量化后的模型

@GradientGuru 大佬，Chat的量化版本链接好像失效了，无法下载，麻烦帮忙看看

什么时候有量化后的模型

找到了，可以用这个链接访问下载 8bit量化后的模型，https://huggingface.co/trillionmonster/Baichuan-13B-Chat-8bit/tree/main

@golddream-y 请教一下，我使用Baichuan-13B-Chat基座模型，GPU的显存为20G，所以只能采用8bit量化，经过测试发现： 1. model = model.quantize(8).cuda() 在线量化不可取，直接OOM 2. load_in_8bit=True, 不会OOM ![image](https://github.com/baichuan-inc/Baichuan-13B/assets/13391430/c9c14722-7997-4d0b-b80d-c2690196095a) 请问1和2有什么区别呢

int8的加载方式为什么是16位的？

这是来自QQ邮箱的自动回复邮件。您好，邮件我已经收到。看到后我一定会在第一时间内阅读并回复您。

如何将训练好的 Lora 文件与 Basemodel 进行Merge合并??

可以参考[ LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/README_zh.md)的调用方式： python src/web_demo.py \ --model_name_or_path path_to_your_model \ --template default \ --finetuning_type lora \ --checkpoint_dir path_to_checkpoint 也可以在Baichuan-13B-Chat/web_demo.py中的init_model() 添加如下2行，试试看 model = PeftModel.from_pretrained(model, lora_dir) model = model.merge_and_unload() @equationdz