Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection icon indicating copy to clipboard operation
Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection copied to clipboard

Patch size

Open NicolasIannuzzi opened this issue 2 years ago • 2 comments

Hi, it's possible edit the patch size and the related options (E.g. 14 instead of 7)? Or is required retrain the model? Thanks.

NicolasIannuzzi avatar Jun 30 '22 08:06 NicolasIannuzzi

Hello, you can edit the patch size but you need to do it properly. The EfficientNet returns images with different shapes depending on the block:

1 torch.Size([64, 24, 56, 56]) 2 torch.Size([64, 24, 56, 56]) 3 torch.Size([64, 40, 28, 28]) 4 torch.Size([64, 40, 28, 28]) 5 torch.Size([64, 80, 14, 14]) 6 torch.Size([64, 80, 14, 14]) 7 torch.Size([64, 80, 14, 14]) 8 torch.Size([64, 112, 14, 14]) 9 torch.Size([64, 112, 14, 14]) 10 torch.Size([64, 112, 14, 14]) 11 torch.Size([64, 192, 7, 7]) 12 torch.Size([64, 192, 7, 7]) 13 torch.Size([64, 192, 7, 7]) 14 torch.Size([64, 192, 7, 7]) 15 torch.Size([64, 320, 7, 7])

Starting from one of these outputs you need to adapt the rest of the settings. To do that it may be useful my answer to this other issue: https://github.com/davide-coccomini/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection/issues/27#issuecomment-1120442441

Pay attention, in that answer I talked about the Medium article table but it seems to be not totally correct, use the one I wrote in this answer.

Obviously if you change the patch size and other settings you will need to retrain the network with the new architecture.

davide-coccomini avatar Jun 30 '22 11:06 davide-coccomini

Ok, thank you for the related parameters, i will try one of that one and and then i'll retrain the network.

Edit:

1 torch.Size([64, 24, 56, 56]) In this row, 64 is the dim-head, 24 is the channel and 56 the patch size, right?

NicolasIannuzzi avatar Jul 01 '22 14:07 NicolasIannuzzi