candle
candle copied to clipboard
Is there a roadmap or intention to support CUDA Graph?
vLLM v1 uses CUDA Graph to capture the execution workflow of the entire model, resulting in significant performance improvements compared to the previous version. I'm wondering if there are any plans to support CUDA Graph in Candle. Would it be possible to add start_capture, end_capture, and replay to the Module so that the captured graph can be replayed within the forward method? @LaurentMazare
Eric may also be interested in this @EricLBuehler