vit-pytorch
vit-pytorch copied to clipboard
Depthwise conv definition
https://github.com/lucidrains/vit-pytorch/blob/60b5687a7997f41c855ebc78ff77040ac5da5b61/vit_pytorch/pit.py#L92
The first layer in the depthwise conv should double the number of channels, as per fig 4 of the PiT paper; and it also makes sense to me to immediately double the channels; if not the first strided-but-equal-channel-conv will form an information bottleneck relative to the high res activation map upstream.
Not sure what that means for the groups argument. I suppose it cant go higher than the number of input channels so it stays the same; each input plane is treated independently, but produces two independent output planes then?
@EelcoHoogendoorn hmm, i do increase the channels at https://github.com/lucidrains/vit-pytorch/blob/60b5687a7997f41c855ebc78ff77040ac5da5b61/vit_pytorch/pit.py#L103
Yeah but its (dim_in, dim_in) in the first strided conv layer. Dont know if its a massive difference in practice, but its not according to the definition in the original paper.
@EelcoHoogendoorn oh got it! yea, i think there's a couple ways to do depthwise conv, fixing this!
@EelcoHoogendoorn ok, let me know if https://github.com/lucidrains/vit-pytorch/releases/tag/0.16.13 looks good to you
Yeah not a depthwise-conv expert by any means; infact this was the first time i dug into the construct cause I got annoyed with the PiT paper not really explaining it imo; but if there are any differences left with the intended meaning in the PiT paper, there isnt any that I can spot.