juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

Calibration for NVFP4?

Hi @mklachko For NVFP4, it introduces two-levels quantization method, the first top-level is the per-Tensor quantization scaling factor, the second level is the fine-grained blockwise quantization scaling factor and yes...

Calibration for NVFP4?

> [@juney-nvidia](https://github.com/juney-nvidia) thanks! Why do we need the two levels of scaling factors? Is it because of the limited range of FP8? Can you please point me to the relevant...

.devcontainer points to internal Docker image

@MartinMarciniszyn for vis. + @zeroepoch @chzblych @mk-nvidia for vis. @aspctu Hi, we are working to publish pre-built containers to the public to ease the efforts of community usage, and it...

perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench

@kaiyux to make sure he is aware of the addition of AutoDeploy as another backend of trtllm-bench. Thanks June

fix: [AutoDeploy] Update README.md

@Fridah-nv Hi Frida, Since this is just a doc change, to save your time, after you finish the refinement based on Lucas's feedback, you can just run "bot skip" command...

feat: Add support of chat completion in PD

@chuangz0 @xiaoweiw-nv @pcastonguay pls help review this MR. Thanks June

fix: WeightOnlyQuantRowLinear

@liquanfeng pls rebase this MR with the latest main branch. @Barry-Delaney pls help review this MR when it gets ready. Thanks June

chore: Refactor return of first gen token in PD

@chuangz0 @xiaoweiw-nv @pcastonguay can you help review this PD related MR? Thanks June

feat: Adding UCX support for cacheTransceiver

Also keeping @pcastonguay @schetlur-nv for vis about this UCX backend support MR for dis-agg serving. Thanks June

How to achieve 253 tok/sec with DeepSeek-R1-FP4 on 8xB200

@jeffye-dev Hi, The 253 perf number is generated based on [trtllm-bench](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3#running-the-benchmark) command. Also, to achieve the 253 perf number, some needed MRs are being prepared to get merged to the...