Suyog Gupta
Suyog Gupta
1. This MR enables the integration of TRTLLM-bench with AutoDeploy. 2. Adds a feature to AutoDeploy inference optimizer to inflate the kv-caches to the available GPU memory. This helps improve...
Add `auto_deploy` namespace to uniquely identify all the custom ops defined in auto_deploy/custom_ops. This could avoid potential namespace conflicts for ops defined in the manual workflow.
## Description 1. Custom ops that wrap `nvtx.start_range` and `nvtx.end_range` markers 2. An annotation pass that inserts the markers in the graph Example: markers inserted for all ops in the...
## Summary by CodeRabbit ## Release Notes * **Chores** * Added new compile-stage transform configuration option (disabled by default) to expand optimization capabilities while maintaining backward compatibility.