lightning-thunder
lightning-thunder copied to clipboard
[benchmark_inference] Investigate why the first and last Dynamo subgraph are not wrapped with cuda graph regions
From the nsys profiles is visible how 4 dynamo subgraphs are generated but only 2 of them are launched with CUDA Graph:
For reference refer to this doc: https://docs.google.com/document/d/1iv8-ujpih7hScQd90nhrl30xJ4D7Q7fRS2hdKuYfu3E/edit?tab=t.0
Repro:
python thunder/benchmarks/benchmark_inference.py --input-length 4096 --output-length 4 --mode thunder --enable-nv-linear --warmup-iterations 2 --num-iterations 2 --enable-thunder-cudagraph