Rémi
Rémi
That seems to rule out any of the extra optimizations we enabled (e.g. `--reduce_fusion` or `--user_buffer` - Llama-specific -). I don't know enough about the internals of TensorRT-LLM but maybe...
Thanks, I guess we'll need someone from Nvidia to chime in here to make progress. Given that it seems to happen in pretty different setups on a very common model...
Hi, is there any update? This issue alone makes it pretty much impossible to use TensorRT-LLM for any serious production load (unless inflight batcher is not in use).
Hi, it seems like a new (pretty big) update was released yesterday: https://github.com/triton-inference-server/tensorrtllm_backend/pull/687 + https://github.com/NVIDIA/TensorRT-LLM/pull/2725 Skimming through the diff I did not see any changes on the inflight batcher so...
@hypdeb do you have any insights on this issue by any chance? I see you have commented on similar-looking issues recently.
Hi @murenti, it is currently possible to delete a Goggle following these steps: https://github.com/brave/goggles-quickstart/blob/main/getting-started.md#deleting-a-goggle I hope that helps,
Would you be able to share the URL of the Goggle you'd like to delete? (If private we can do that via support email instead)
You may reach out to [[email protected]](mailto:[email protected])
Hi @hicallmeal, in general it should be safe to use the experimental version of tldts but it really depends on your particular use-case. What would be the cost if the...
Hi @marcospassos and @samczsun, Depending on the specification that is followed it is unclear if underscores are allowed at all in a hostname in the first place (I believe we...