jamesljl comments

Results 6 comments of


                                            jamesljl

Error to loading the latest Chinese LLaMa Alpaca model

I'v encountered the same problem. The some steps above , no errors while merging and quantizing. just hangs when loading the model, before ">" appears. but the model seems loaded...

Error to loading the latest Chinese LLaMa Alpaca model

> Download the previous version ： https://github.com/ggerganov/llama.cpp/releases the previous version don't work , even can't load 8-bit quantized model

Error to loading the latest Chinese LLaMa Alpaca model

I just reinstalled ubuntu vm and clone the latest version, re-compiled it. then it works. that's weird

请问 fast_inference分支是否主要是针对 python web 框架做了改动，没有针对单机多GPU的推理做优化？

如果要配置单机多 GPU，是否直接改为 os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3" 类似这样就可以了？还是需要使用 nn.DataParallel 将模型 wrap 一下？

Integration with Llama 2

i ran into the same problem, I am using alpaca2-7B 8-bit quantized model : ggml-model-q8_0.bin , it behaved weird , lots of nonsense and the prompt seems not working.

请新增对azure o3mini模型的支持

o1 系列的模型适配都还没有完成，更别说 o3 了。o1模型不支持max_tokens 、system角色，听说还有stream下的 token。感觉已经很久没有更新版本了。

jamesljl

Error to loading the latest Chinese LLaMa Alpaca model

Error to loading the latest Chinese LLaMa Alpaca model

Error to loading the latest Chinese LLaMa Alpaca model

请问 fast_inference分支 是否主要是针对 python web 框架做了改动，没有针对单机多GPU的推理做优化？

Integration with Llama 2

请新增对azure o3mini模型的支持

请问 fast_inference分支是否主要是针对 python web 框架做了改动，没有针对单机多GPU的推理做优化？