tutorials [BUG] No ways provided to replicate fps on retrained models.

Add Link

https://pytorch.org/tutorials/intermediate/realtime_rpi.html

Describe the bug

I am getting 25-30fps on my rpi4 with provided snippet. However, after finetuning mobilenet_v2 and applying:

# Quantize the model
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Convert the quantized model to TorchScript
script_model = torch.jit.script(quantized_model)

I am only getting 2.5fps. The tutorial suggests:

You can create your own model or fine tune an existing one. If you fine tune on one of the models from [torchvision.models.quantized](https://pytorch.org/vision/stable/models.html#quantized-models) most of the work to fuse and quantize has already been done for you so you can directly deploy with good performance on a Raspberry Pi.

But provides no guidance on how to do it. My attempts to do so failed:

torch.backends.quantized.engine = 'qnnpack'
model = models.quantization.mobilenet_v2(pretrained=True, quantize=True) # INT

num_classes = 3
model.classifier[1] = torch.nn.Linear(model.last_channel, num_classes)

would result in

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-48-ddcd2d77aac5>](https://localhost:8080/#) in <cell line: 24>()
     39 
     40         # Forward pass
---> 41         outputs = model(inputs)
     42         loss = criterion(outputs, labels)
     43 

6 frames

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py](https://localhost:8080/#) in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 must have the same dtype

Multiple attempts to create custom Linear layer that supports int8 dtype also failed.

Describe your environment

not relevant

cc @datumbox @nairbv @fmassa @NicolasHug @YosuaMichael

Jun 24 '23 12:06 Huxwell

@d4l3k can you help?

Jun 26 '23 15:06 svekars

@Huxwell the issue is you're loading a quantized model via quantize=True and then mixing quantized and non-quantized layers. You need to load the unquantized model, modify/train/finetune and then apply the quantizations.

I have an example of this conversion script at https://github.com/d4l3k/friday/blob/master/convert.py#L41-L68

Agreed the docs on this are pretty sparse https://pytorch.org/vision/main/models/generated/torchvision.models.quantization.mobilenet_v2.html#torchvision.models.quantization.mobilenet_v2

cc @datumbox for improving the quantized model training documentation

@Huxwell if you do get this all working, I'd love to accept contributions for improving/fleshing out the rpi tutorial with fine tuning instructions

Let me know if you have any questions on implementation details, etc

Jun 26 '23 20:06 d4l3k