Polygraphy only supports a single input sample (iteration) during debug reduce
Description
The Polygraphy CLI examples show how to implement a custom data loader script. In that example, the custom data loader returns a generator which yields 5 distinct input samples.
Another CLI tutorial describes how to reduce failing ONNX models using debug reduce. That tutorial mentions:
Though we're using a file here, input data can be provided via any other Polygraphy data loader argument covered in the CLI user guide.
However, the tutorial does not mention that debug reduce does not support custom data loader scripts which return more than 1 input sample/iteration.
As the code does not check for the existence of multiple input sample, instead of printing an error message, it fails silently by using wrong intermediate results during the comparison, which leads to incorrect comparison results (e.g. comparison failing, even though it should succeed, or vice versa), and wrong subgraphs being identified as faulty.
Underlying issues in polygraphy
The following two underlying issues are contributing to the described behavior:
1.) Initialization bug in polygraphy data to-input
Step 2 of the tutorial on reducing failing ONNX models relies on the polygraphy data to-input sub-tool. There is a bug in the tool that causes it to generate an incorrect layerwise_inputs.json when the custom data loader returns more than 1 input sample (=iteration):
https://github.com/NVIDIA/TensorRT/blob/a833f79454d313c259a561704b655be21eb30c15/tools/Polygraphy/polygraphy/tools/data/subtool/to_input.py#L48-L64
Here, the padding in line 61 is performed incorrectly. Note that this line is always executed, since initially, inputs is initialized with an empty list (line 48).
Instead of creating a new, empty OrderedDict for each input sample, e.g. using a list comprehension over range(len(new_inputs) - len(inputs)), the multiplication of the list [OrderedDict()] by a scalar leads to the same dictionary instance being inserted into the list N times. This means that when e.g. the 2nd input sample is being loaded into the dict using inp.update(new_inp), the tensors of the 1st input sample will erroneously get overwritten.
Thus, the layerwise intermediate input tensors stored in layerwise_inputs.json will (wrongly) be identical across all iterations. However, the corresponding golden output tensors stored in layerwise_golden.json (used in step 3 of the tutorial) correctly vary across iterations/input samples. Therefore, the comparison will be executed incorrectly, and any root cause analysis of failing ONNX subgraphs will mis-lead to wrong conclusions when using more than 1 input sample.
2.) debug reduce constant-folds input tensors from only the 1st input iteration
If the model contains multiple branches and input reduction is enabled, polygraphy debug reduce uses load_tensors_from_fallback(tensors_to_freeze) to determine the input tensor values that shall be constant-folded:
https://github.com/NVIDIA/TensorRT/blob/a833f79454d313c259a561704b655be21eb30c15/tools/Polygraphy/polygraphy/tools/debug/subtool/reduce.py#L368-L401
That function, in turn, invokes OnnxInferShapesArgs.fallback_inference() via ONNX Runtime:
https://github.com/NVIDIA/TensorRT/blob/a833f79454d313c259a561704b655be21eb30c15/tools/Polygraphy/polygraphy/tools/debug/subtool/reduce.py#L279-L288
However, OnnxInferShapesArgs.fallback_inference() always operates only on the 1st input iteration, as it invokes __getitem__(0) on the DataLoaderCache in line 204:
https://github.com/NVIDIA/TensorRT/blob/a833f79454d313c259a561704b655be21eb30c15/tools/Polygraphy/polygraphy/tools/args/backend/onnx/loader.py#L195-L204
Instead, the correct way would be to perform fallback inference for each input sample. In the next step, a separate intermediate ONNX model would need to be produced for each input iteration, with the respective model input branch being replaced by the correct constant input that corresponds to the N-th iteration.
Suggested workarounds
Since the 2nd issue above is not trivial to fix, one could imagine the following workaround to avoid other users of Polygraphy on wasting hours/days of their time due to inconsistent results being generated by polygraphy:
-
In the README files and the above tutorials, clearly specify that reducing failing ONNX models using
debug reduceonly supports a single input sample/iteration (when using custom data, and when reduction of inputs is enabled). -
Within
debug reduce, check if the data loader is returning more than 1 iteration; if yes, raise an error message.
Environment
TensorRT Version: 10.13.3.9
NVIDIA GPU: A100
NVIDIA Driver Version: 575.51.03
CUDA Version: 12.8
CUDNN Version: 9.8
Polygraphy Version: 0.49.26
Operating System: Linux
Relevant Files
The above issues should occur whenever a custom data loader script is used that returns more than 1 input sample/iteration.
The issues should appear regardless of whether linear or bisect mode is used with polygraphy debug reduce.
Steps To Reproduce
Follow the tutorial on reducing failing ONNX models.
Use a custom data loader that returns >1 input iteration (e.g. 2).
Observed behavior: Comparison results will be inconclusive, e.g. polygraphy will identify a wrong subgraph to be failing.
Expected behavior:
Consistent comparison results & correct failing subgraph to be identified; OR polygraphy should print an error message that multiple input samples/iterations are not supported in debug reduce mode; OR instruct the user to disable reduction of inputs via the command line in that case.
Have you tried the latest release?: Yes