vit-pytorch icon indicating copy to clipboard operation
vit-pytorch copied to clipboard

Depthwise conv definition

Open EelcoHoogendoorn opened this issue 4 years ago • 5 comments

https://github.com/lucidrains/vit-pytorch/blob/60b5687a7997f41c855ebc78ff77040ac5da5b61/vit_pytorch/pit.py#L92

The first layer in the depthwise conv should double the number of channels, as per fig 4 of the PiT paper; and it also makes sense to me to immediately double the channels; if not the first strided-but-equal-channel-conv will form an information bottleneck relative to the high res activation map upstream.

Not sure what that means for the groups argument. I suppose it cant go higher than the number of input channels so it stays the same; each input plane is treated independently, but produces two independent output planes then?

EelcoHoogendoorn avatar Apr 29 '21 10:04 EelcoHoogendoorn

@EelcoHoogendoorn hmm, i do increase the channels at https://github.com/lucidrains/vit-pytorch/blob/60b5687a7997f41c855ebc78ff77040ac5da5b61/vit_pytorch/pit.py#L103

lucidrains avatar Apr 29 '21 16:04 lucidrains

Yeah but its (dim_in, dim_in) in the first strided conv layer. Dont know if its a massive difference in practice, but its not according to the definition in the original paper.

EelcoHoogendoorn avatar Apr 29 '21 19:04 EelcoHoogendoorn

@EelcoHoogendoorn oh got it! yea, i think there's a couple ways to do depthwise conv, fixing this!

lucidrains avatar Apr 29 '21 19:04 lucidrains

@EelcoHoogendoorn ok, let me know if https://github.com/lucidrains/vit-pytorch/releases/tag/0.16.13 looks good to you

lucidrains avatar Apr 29 '21 19:04 lucidrains

Yeah not a depthwise-conv expert by any means; infact this was the first time i dug into the construct cause I got annoyed with the PiT paper not really explaining it imo; but if there are any differences left with the intended meaning in the PiT paper, there isnt any that I can spot.

EelcoHoogendoorn avatar Apr 29 '21 20:04 EelcoHoogendoorn