Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection
Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection copied to clipboard
Patch size
Hi, it's possible edit the patch size and the related options (E.g. 14 instead of 7)? Or is required retrain the model? Thanks.
Hello, you can edit the patch size but you need to do it properly. The EfficientNet returns images with different shapes depending on the block:
1 torch.Size([64, 24, 56, 56]) 2 torch.Size([64, 24, 56, 56]) 3 torch.Size([64, 40, 28, 28]) 4 torch.Size([64, 40, 28, 28]) 5 torch.Size([64, 80, 14, 14]) 6 torch.Size([64, 80, 14, 14]) 7 torch.Size([64, 80, 14, 14]) 8 torch.Size([64, 112, 14, 14]) 9 torch.Size([64, 112, 14, 14]) 10 torch.Size([64, 112, 14, 14]) 11 torch.Size([64, 192, 7, 7]) 12 torch.Size([64, 192, 7, 7]) 13 torch.Size([64, 192, 7, 7]) 14 torch.Size([64, 192, 7, 7]) 15 torch.Size([64, 320, 7, 7])
Starting from one of these outputs you need to adapt the rest of the settings. To do that it may be useful my answer to this other issue: https://github.com/davide-coccomini/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection/issues/27#issuecomment-1120442441
Pay attention, in that answer I talked about the Medium article table but it seems to be not totally correct, use the one I wrote in this answer.
Obviously if you change the patch size and other settings you will need to retrain the network with the new architecture.
Ok, thank you for the related parameters, i will try one of that one and and then i'll retrain the network.
Edit:
1 torch.Size([64, 24, 56, 56]) In this row, 64 is the dim-head, 24 is the channel and 56 the patch size, right?