FlyPanda

Results 8 comments of FlyPanda

Locating tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TokenPtr tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderStepAsync(tensorrt_llm::batch_manager::RequestTable&, const ReqIdsVec&, const ReqIdsVec&) crashes, but the code may be closed source

Encountered an issue while using speculative decoding: '[TensorRT LM] [ERROR] Encountered an error in forward function: slice 501760 excesses buffer size 250880', 0.9.0 dev20240222000 is normal

Is GatedMLP suitable for medusa decoration? I found two characteristics during debugging 1. The only difference between the modified bloom/modl.py and llama lies in the MLP layer, where llama uses...

@rakib-hasan How to verify the differences caused by the position encoding algorithm? I found that forcibly modifying the "position-embeddingtotype" in convert_checkpoint: "rope_gpt_neos" did not work

@sundayKK sun I adapted qwen2-7b, but found that the result was completely different from the base model, so it failed. You can follow the steps below: 1. Adapt qwen training...

> @skyCreateXian Apologies for the late response. That sounds correct. Changing the position encoding at inference time won't work as the Bloom model seems to be trained with ALiBi. The...

It seems that this is caused by the absence of the lookahead parameter, where the max_daft.len parameter is determined by the [(W, N, G) parameter](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/lookahead). Setting the decoding method to...