fastllm icon indicating copy to clipboard operation
fastllm copied to clipboard

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

Results 170 fastllm issues
Sort by recently updated
recently updated
newest added

Windows下尝试import fastllm_pytools失败,是否和还不支持两行调用有关

在使用web_api.py时,问题如题,请问怎么解决,是哪里出问题了吗?而且只有batch api的接口有返回结果,stream api无返回结果

只得到了打印为: Segmentation fault 使用方式是: ``` tokenizer = AutoTokenizer.from_pretrained( "../ChatGLM2-6B/model/chatglm2-6b", trust_remote_code=True ) model = AutoModel.from_pretrained("../ChatGLM2-6B/model/chatglm2-6b", trust_remote_code=True) from fastllm_pytools import llm model = llm.from_hf(model, tokenizer, dtype = "float16") start = time.perf_counter() count=0...

Traceback (most recent call last): File "/home/zengzijian/ai_code/fastllm/build/tools/baichuan2flm.py", line 18, in torch2flm.tofile(exportPath, model, tokenizer, dtype = dtype) File "/home/zengzijian/ai_code/fastllm/build/tools/fastllm_pytools/torch2flm.py", line 156, in tofile cur = dict[key].numpy().astype(ori_np_data_type) TypeError: can't convert cuda:0 device...

I convert llama2-7b using `fastllm_pytools.torch2flm`, The inference result looks wrong, Also **inconsistent** with inference results using llama2-7b directly: prompt: **The president of the United States is** generate result: ``` ###...

qwen模型history数组 大于22,中文字或英文单词大于4000之后显存暴涨,自动挂掉!history数组大于50中文字或英文单词大于9000则报错too many

报错信息 python3 -m ftllm.chat -t 16 -p ~/llm/fastllm/models/ --dtype int4 Load AutoTokenizer failed. (you can try install transformers) Try load fastllm tokenizer. zsh: segmentation fault python3 -m ftllm.chat -t 16...