flankedge comments

Results 15 comments of


                                            flankedge

Hi. I've recently tested this implementation on [blip2_vicuna_instruct](https://github.com/salesforce/LAVIS). It utilizes vit_qformer's embedding as a prefix_soft_embedding, which will be fed into vicuna with prompt's token_ids. According to my test result, I...

moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled

@QiJune It seems that 0.11.0 has some minor bugs for qwen2-moe int8 weight only quantization. First, I got `AttributeError: 'PretrainedConfig' object has no attribute 'moe'` ``` [08/04/2024-21:33:19] [TRT-LLM] [W] Found...

moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled

I just leave that behind; it no longer matters. But I believe that you resolved this problem since so much time has elapsed, and many efforts you and the Team...

Deepseek model streaming mode with Chinese character �?

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest `tensorrt_llm_bls`...

Deepseek model streaming mode with Chinese character �?

> > It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the...

Deepseek model streaming mode with Chinese character �?

have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.

Deepseek model streaming mode with Chinese character �?

> > have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue. > > @handoku Yes, if stream=false, the Chinese in the...

2x docker image size increase for trtllm: from 8.38 GB (24.03) to 18.46 GB (24.04)

Hi, everyone. As a user of trtllm backend. I notice that a [model.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py) added in the main branch. Are you going to replace this c++ backend with python backend? move...

flankedge

Crash when model with cast

3个op不支持

[enhancement] support llama

moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled

moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled

Deepseek model streaming mode with Chinese character �?

Deepseek model streaming mode with Chinese character �?

Deepseek model streaming mode with Chinese character �?

Deepseek model streaming mode with Chinese character �?

2x docker image size increase for trtllm: from 8.38 GB (24.03) to 18.46 GB (24.04)