FlyPanda comments

Results 8 comments of


                                            FlyPanda

Chunked context incomplete outputs

Locating tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TokenPtr tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderStepAsync(tensorrt_llm::batch_manager::RequestTable&, const ReqIdsVec&, const ReqIdsVec&) crashes, but the code may be closed source

Encountered an error in forward function: slice 712 exceeds buffer size 471

Encountered an issue while using speculative decoding: '[TensorRT LM] [ERROR] Encountered an error in forward function: slice 501760 excesses buffer size 250880', 0.9.0 dev20240222000 is normal

How to use Medusa to support non llama models?

Is GatedMLP suitable for medusa decoration? I found two characteristics during debugging 1. The only difference between the modified bloom/modl.py and llama lies in the MLP layer, where llama uses...

How to use Medusa to support non llama models?

@rakib-hasan How to verify the differences caused by the position encoding algorithm? I found that forcibly modifying the "position-embeddingtotype" in convert_checkpoint: "rope_gpt_neos" did not work

How to use Medusa to support non llama models?

@sundayKK sun I adapted qwen2-7b, but found that the result was completely different from the base model, so it failed. You can follow the steps below: 1. Adapt qwen training...

How to use Medusa to support non llama models?

> @skyCreateXian Apologies for the late response. That sounds correct. Changing the position encoding at inference time won't work as the Bloom model seems to be trained with ALiBi. The...

How to use Medusa to support non llama models?

@poweiw no, thanks

Tritonserver Fails to Start with TensorRT-LLM Backend with lookahead_decoding mode - Assertion Failure in lookaheadDecodingLayer.cpp

It seems that this is caused by the absence of the lookahead parameter, where the max_daft.len parameter is determined by the [(W, N, G) parameter](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/lookahead). Setting the decoding method to...