FlyPanda
FlyPanda
Locating tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TokenPtr tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderStepAsync(tensorrt_llm::batch_manager::RequestTable&, const ReqIdsVec&, const ReqIdsVec&) crashes, but the code may be closed source
Encountered an issue while using speculative decoding: '[TensorRT LM] [ERROR] Encountered an error in forward function: slice 501760 excesses buffer size 250880', 0.9.0 dev20240222000 is normal
Is GatedMLP suitable for medusa decoration? I found two characteristics during debugging 1. The only difference between the modified bloom/modl.py and llama lies in the MLP layer, where llama uses...
@rakib-hasan How to verify the differences caused by the position encoding algorithm? I found that forcibly modifying the "position-embeddingtotype" in convert_checkpoint: "rope_gpt_neos" did not work
@sundayKK sun I adapted qwen2-7b, but found that the result was completely different from the base model, so it failed. You can follow the steps below: 1. Adapt qwen training...
> @skyCreateXian Apologies for the late response. That sounds correct. Changing the position encoding at inference time won't work as the Bloom model seems to be trained with ALiBi. The...
@poweiw no, thanks
It seems that this is caused by the absence of the lookahead parameter, where the max_daft.len parameter is determined by the [(W, N, G) parameter](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/lookahead). Setting the decoding method to...