Dheeraj Peri
Dheeraj Peri
We currently use NVIDIA Model optimizer toolkit which inserts quantization nodes within the torch model using quantize API 1) https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/9c54aa1c47871d0541801a20962996461d805162/modelopt/torch/quantization/model_quant.py#L126 2) https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/9c54aa1c47871d0541801a20962996461d805162/modelopt/torch/quantization/tensor_quant.py#L229-L243 (definition of custom ops which do the quantization)....
Thanks @junstar92 for the contribution. Instead of modifying the FX path, we should import these utilities from the dynamo path since it is actively being developed. So, instead can you...
Also, @junstar92 please rebase with main. Some of the CI failures should be resolved
The PR 3513 is merged. Please try with the latest and 25.05 should have it. Reopen incase this exists.
Rest of the code LGTM
Hello @yjjinjie Is this still an issue ? If so, can you point me to the exact model `Matmul` or `Matmul2` in https://github.com/pytorch/TensorRT/issues/3127#issuecomment-2325512369 which is the issue ?
Model device budget: - Include pytorch subgraph weights as well in the device budget (count parameters and estimate size) - Conclusion is estimate new budget by subtracting pytorch submodule sizes....
Thanks for filing this @broken-dream I was able to reproduce this with torch 2.5 nightly. It works with Pytorch 2.4 so this is a regression. We will investigate further.
Can you rebase?
Hello @kacper-kleczewski , I tried your script with the main branch (which is on `2.5.0.dev20240822+cu124`) and it works fine. Here's the slightly modified script that I tried ```py import torch...