Lzhang-hub

Results 18 comments of Lzhang-hub

I had test with MODEL_TYPE=llava, it can not work. Is it closed because it is already supported?

this is error logs: ``` Traceback (most recent call last): File "/data1/nfs15/nfs/bigdata/zhanglei/conda/envs/rtp-llm-0227/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/data1/nfs15/nfs/bigdata/zhanglei/conda/envs/rtp-llm-0227/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/maga_transformer/start_server.py",...

![image](https://github.com/ztxz16/fastllm/assets/57925599/17e0a196-2b73-4083-b032-e9a59b4d7536) 看了一下源码,Forwardbatch里面是for循环调用的Forward?这样是不是没有任何加速效果

@ztxz16 哦哦,看到了,但是耗时线性增长会啥原因呀,chatglm2模型,直接用demo里面的web_api 中的/api/batch_chat接口测试也是和list长度呈线性关系

> 这种方式测试的是prefill阶段,建议测试decode阶段。 @wildkid1024 我理解这种直接采用web api部署的测试方式应该是更符合实际的模型部署场景,现在我们有一个batch推理的生产需求,只测试decode阶段是不是不能满足实际推理场景呀

same issue, any update?

@symphonylyh I cast input_ids to int32 by `input_ids = input_ids.to(torch.int32)`, got same result.

I use torch.randint() to generate input_ids instead of tokenizer, for the same input_ids, the result of trtllm model and hf model still different. ```python import argparse import json import os...