flankedge
flankedge
> Guys, thanks for the feedback, if you are on Blackwell silicon please try build&run torch flow deepseek v3 follow [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3) instructions, `deepseek` branch is deprecated and no longer update...
Testing with 1k input seq len is ridiculous. Chat/RAG/Agent, etc. I mean, any user cases will crash the `max_num_tokens` limit. And It appears that `max_num_tokens` has a significant impact on...
emmm...actually, if you print `args`, you will find `args["model_instance_device_id"]` or `args["model_instance_name"]`may be the one you want to get.
We performed a test using Triton Server 25.06 (based on CUDA 12.9). The anomalous results persist, which suggests that the issue is likely unrelated to NVIDIA CUDA driver compatibility. Theoretically,...
The root cause has been identified. Disabling `output_copy_stream` resolves the issue. It appears that the output copy stream does not correctly synchronize with the CUDA Graph execution stream. We look...