Rémi comments

Results 172 comments of


                                            Rémi

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

That seems to rule out any of the extra optimizations we enabled (e.g. `--reduce_fusion` or `--user_buffer` - Llama-specific -). I don't know enough about the internals of TensorRT-LLM but maybe...

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

Thanks, I guess we'll need someone from Nvidia to chime in here to make progress. Given that it seems to happen in pretty different setups on a very common model...

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

Hi, is there any update? This issue alone makes it pretty much impossible to use TensorRT-LLM for any serious production load (unless inflight batcher is not in use).

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

Hi, it seems like a new (pretty big) update was released yesterday: https://github.com/triton-inference-server/tensorrtllm_backend/pull/687 + https://github.com/NVIDIA/TensorRT-LLM/pull/2725 Skimming through the diff I did not see any changes on the inflight batcher so...

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

@hypdeb do you have any insights on this issue by any chance? I see you have commented on similar-looking issues recently.

Rémi

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

[bug] forwardAsync assertion failed: Input length (6973) + max new tokens (4095) + draft tokens (0) must be less than max sequence length (8192)

Remove a submitted goggle?

Remove a submitted goggle?

Remove a submitted goggle?

Question regarding tldts-experimental reliability

Invalid hostnames not being detected