lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

[benchmark_inference] Investigate why the first and last Dynamo subgraph are not wrapped with cuda graph regions

Open mattteochen opened this issue 1 month ago • 0 comments

From the nsys profiles is visible how 4 dynamo subgraphs are generated but only 2 of them are launched with CUDA Graph:

Image

For reference refer to this doc: https://docs.google.com/document/d/1iv8-ujpih7hScQd90nhrl30xJ4D7Q7fRS2hdKuYfu3E/edit?tab=t.0

Repro:

python thunder/benchmarks/benchmark_inference.py --input-length 4096 --output-length 4 --mode thunder --enable-nv-linear --warmup-iterations 2 --num-iterations 2 --enable-thunder-cudagraph

mattteochen avatar Nov 04 '25 14:11 mattteochen