TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

feat: Support storing CUDAGraphs for different input profiles

Open peri044 opened this issue 4 months ago • 0 comments

Description

Currently, CUDAGraphs get reset when a different inputs are observed. Instead store a cudagraph per input shape key. This is especially important in LLM inference (where prefill and decode have different input shapes)

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • [ ] My code follows the style guidelines of this project (You can use the linters)
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have added tests to verify my fix or my feature
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified

peri044 avatar Aug 25 '25 19:08 peri044