Xiaodong (Vincent) Huang
Xiaodong (Vincent) Huang
Hi @deephog , we recommend to do PTQ first, then doing the QAT to fine tune the weights using the fixed quant scale. This helps converge. In theory you can...
Closing since no response for more than 3 weeks, please reopen if you still have question, thanks!
The `fnet.conv1.weight` is shared by multiple conv, and currently TRT cannot constant fold the weights that shared. Dynamic weights input support could fix this, this is already in TRT plan...
I created internal task 3741010 to track this issue. @deephog , could you wait for next major release? Thanks!
The mode (https://drive.google.com/file/d/1xJyU7CnVqzc8tBU0_ruewTD1fgxrz1EA/view?usp=sharing) will be fixed in 8.5EA, closing and thanks!
@gj-raza currently the quantization is implemented by a sequence of pytorch op, and this can be accelerated by using cuda extension. I will create internal feature request for this, thanks!
Hello @zhangjoey115 , I assume you are using the `mark all` function in the polygraphy. first `mark all` can hidden some issue while debug accuracy, this is because without `mark...
@zhangjoey115 , sorry for the delay response, do you share a simple repro onnx model? thanks!
Closing since no response for more than 3 weeks, please reopen if you still have question, thanks!
Hello @Ricardosuzaku , the steps are correct. what's the accuracy when you run the pytorch_quantization toolkit? thanks