David Blincoe comments

Repositories
Issues
Comments

Results 1 comments of


                                            David Blincoe

[Core] Support full cuda graph in v1

I ran some latency-focused testing on this PR using LLaMA 3.2 1B Instruct with a small batch size (~1-2) in a highly latency-constrained setting where minimizing CUDA graph launches can...