candle icon indicating copy to clipboard operation
candle copied to clipboard

Is there a roadmap or intention to support CUDA Graph?

Open guoqingbao opened this issue 5 months ago • 4 comments

vLLM v1 uses CUDA Graph to capture the execution workflow of the entire model, resulting in significant performance improvements compared to the previous version. I'm wondering if there are any plans to support CUDA Graph in Candle. Would it be possible to add start_capture, end_capture, and replay to the Module so that the captured graph can be replayed within the forward method? @LaurentMazare

Eric may also be interested in this @EricLBuehler

guoqingbao avatar Jun 23 '25 10:06 guoqingbao