Lzhang-hub comments

Results 18 comments of


                                            Lzhang-hub

support Yi-Vl

I had test with MODEL_TYPE=llava, it can not work. Is it closed because it is already supported?

this is error logs: ``` Traceback (most recent call last): File "/data1/nfs15/nfs/bigdata/zhanglei/conda/envs/rtp-llm-0227/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/data1/nfs15/nfs/bigdata/zhanglei/conda/envs/rtp-llm-0227/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/maga_transformer/start_server.py",...

batch_response() 耗时和prompt list长度成线性关系

![image](https://github.com/ztxz16/fastllm/assets/57925599/17e0a196-2b73-4083-b032-e9a59b4d7536) 看了一下源码，Forwardbatch里面是for循环调用的Forward？这样是不是没有任何加速效果

batch_response() 耗时和prompt list长度成线性关系

@ztxz16 哦哦，看到了，但是耗时线性增长会啥原因呀，chatglm2模型，直接用demo里面的web_api 中的/api/batch_chat接口测试也是和list长度呈线性关系

batch_response() 耗时和prompt list长度成线性关系

> 这种方式测试的是prefill阶段，建议测试decode阶段。 @wildkid1024 我理解这种直接采用web api部署的测试方式应该是更符合实际的模型部署场景，现在我们有一个batch推理的生产需求，只测试decode阶段是不是不能满足实际推理场景呀

BERT Model is Inaccurate

same issue, any update?

BERT Model is Inaccurate

@symphonylyh I cast input_ids to int32 by `input_ids = input_ids.to(torch.int32)`, got same result.

BERT Model is Inaccurate

I use torch.randint() to generate input_ids instead of tokenizer, for the same input_ids, the result of trtllm model and hf model still different. ```python import argparse import json import os...