flankedge
flankedge
2022s here, and still no progress?
@CLIsVeryOK 没解决,直接上了paddle模型
Hi. I've recently tested this implementation on [blip2_vicuna_instruct](https://github.com/salesforce/LAVIS). It utilizes vit_qformer's embedding as a prefix_soft_embedding, which will be fed into vicuna with prompt's token_ids. According to my test result, I...
@QiJune It seems that 0.11.0 has some minor bugs for qwen2-moe int8 weight only quantization. First, I got `AttributeError: 'PretrainedConfig' object has no attribute 'moe'` ``` [08/04/2024-21:33:19] [TRT-LLM] [W] Found...
I just leave that behind; it no longer matters. But I believe that you resolved this problem since so much time has elapsed, and many efforts you and the Team...
It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest `tensorrt_llm_bls`...
> > It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the...
have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.
> > have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue. > > @handoku Yes, if stream=false, the Chinese in the...
Hi, everyone. As a user of trtllm backend. I notice that a [model.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py) added in the main branch. Are you going to replace this c++ backend with python backend? move...