nuaabuaa07 comments

Results 11 comments of


                                            nuaabuaa07

部署过程有两个地方需要注意一下

大佬求教，“修改 docker-conpose.yml 邮箱相关项”。这个我不太理解意思。这里面具体需要添加哪些项目呢？里面的邮箱是我们常用的QQ邮箱那种吗？可否提供一个样例

this is my config in .env but still got error this is error msg: ERROR [pilot.scene.base_chat] model response parase faild！Model server error!code=1, errmsg is **LLMServer Generate Error, Please CheckErrorInfo.**: Error...

[Bug] chatexcel上传文件时报错excel load error，none is not an allowed value，type_error.none.not_allowed

我也出现了类似报错，今天重新拉取代码后，确实不报错了，正常了

Add support to MiniCPM-2B model

> I suprisingly found that after I updated ollama verison, the sha:256 related problem disappears. I will work on the performance of this quantized model recently. Thanks for the community's...

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

难道，推理服务，只能部署在单GPU的机器上？

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

单卡时报内存不足。 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 22.20 GiB total capacity; 21.53 GiB already allocated; 48.12 MiB free; 21.55 GiB reserved in total by...

模型合并好了，跑示例代码报错了

我报的错误和你相似，区别是在发现，在两个显卡上，不允许。还没找到解决方法

13b的模型跑起来，需要多少显存资源

我A10 双卡，也报不支持多卡错误。可以详细说一下，如何多卡使用吗？

13b的模型跑起来，需要多少显存资源

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path, # torch_dtype=torch.float16, load_in_8bit=True, device_map="auto", ) `

13b的模型跑起来，需要多少显存资源

> 量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path, 直接加 load_in_8bit=True 会报错需要使用。需要这样 `python nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") model = LlamaForCausalLM.from_pretrained( ziya_model_path, quantization_config=nf4_config, device_map='auto' ) `