fastllm issues

Windows下两行调用的支持

Windows下尝试import fastllm_pytools失败，是否和还不支持两行调用有关

aurxs

chatglm2-6b转换成flm个时候，生成答案中数字部分与原模型输出不一致

1

如题，对于数字生成时出现*或重复现象

snakecy

api调用，必须将模型量化保存为int4才能使用，否则返回结果为<unk>

2

在使用web_api.py时，问题如题，请问怎么解决，是哪里出问题了吗？而且只有batch api的接口有返回结果，stream api无返回结果

muximuxi

只得到了打印为： Segmentation fault 使用方式是： ``` tokenizer = AutoTokenizer.from_pretrained( "../ChatGLM2-6B/model/chatglm2-6b", trust_remote_code=True ) model = AutoModel.from_pretrained("../ChatGLM2-6B/model/chatglm2-6b", trust_remote_code=True) from fastllm_pytools import llm model = llm.from_hf(model, tokenizer, dtype = "float16") start = time.perf_counter() count=0...

duanhaowei

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

2

Traceback (most recent call last): File "/home/zengzijian/ai_code/fastllm/build/tools/baichuan2flm.py", line 18, in torch2flm.tofile(exportPath, model, tokenizer, dtype = dtype) File "/home/zengzijian/ai_code/fastllm/build/tools/fastllm_pytools/torch2flm.py", line 156, in tofile cur = dict[key].numpy().astype(ori_np_data_type) TypeError: can't convert cuda:0 device...

ArtificialZeng

Is there any accuracy loss when converting to flm model?

1

I convert llama2-7b using `fastllm_pytools.torch2flm`, The inference result looks wrong, Also **inconsistent** with inference results using llama2-7b directly: prompt: **The president of the United States is** generate result: ``` ###...

empty2enrich

qwen模型history数组大于22，中文字或英文单词大于4000之后显存暴涨，自动挂掉！history数组大于50中文字或英文单词大于9000则报错too many

ladygagaclass

编译完之后运行模型时报错

1

报错信息 python3 -m ftllm.chat -t 16 -p ~/llm/fastllm/models/ --dtype int4 Load AutoTokenizer failed. (you can try install transformers) Try load fastllm tokenizer. zsh: segmentation fault python3 -m ftllm.chat -t 16...

supercj92