juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups

> When is the next meeting? The first online meet-up will be arranged in the end of April, in which we will introduce the latest status of PyTorch-centric re-architecture of...

[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups

> When is the next meeting? We are working with the prod team to prepare it, @laikhtewari . When it becomes ready, we will share to the public. Thanks June

[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups

> I’d like to suggest two topics for discussion in the upcoming meet-ups: > > * Getting Started with TensorRT-LLM: A beginner-friendly guide on how new contributors can start learning...

Request for Reproduction Configuration of DeepSeek-R1 on H200 & B200

@kaiyux @Kefeng-Duan for vis on this question from the community. @laikhtewari for vis also. June

Request for Reproduction Configuration of DeepSeek-R1 on H200 & B200

> ``` > trtllm-serve nvidia/DeepSeek-R1-FP4 \ > --max_batch_size 256 --max_num_tokens 32768 \ > --max_seq_len 32768 --kv_cache_free_gpu_memory_fraction 0.95 \ > --host 0.0.0.0 --port 30001 --trust_remote_code --backend pytorch --tp_size 8 --ep_size 8...

[Call for contributions]The development plan of large-scale EP support in TensorRT-LLM

> Great to hear this! [@juney-nvidia](https://github.com/juney-nvidia), do we have a plan to setup EP partition analytic models ? > > It is generallly believed that EP should be evenly distributed...

Force KV Cache Offload

Hi @khayamgondal We have some performance study before of offloading KV cache to CPU and the finding at that time tells us that there isn't perf gain, so we only...

Force KV Cache Offload

> Thanks, June I'm working on a study to understand how much hit performance > takes when part of the inference process (KV cache in this scenario) is > offloaded...

Force KV Cache Offload

> Thanks [@juney-nvidia](https://github.com/juney-nvidia) I am looking at `KvCacheConfig `class and wondering if I set the following to 0, would this force not to use GPU for KV cache? > >...

infra: Devcontainer productivity improvements

@lucaslie Thanks for improving the dev productivity! @niukuo Since this is container related change, can you also help review this MR? June