dropblock
dropblock copied to clipboard
inconsistency with the original paper
Hello, thanks for your nice code!
I found there were 2 inconsistencies with the original paper, and they are very easy to fix indeed:
- the
gamma
: in the original paper, all theblock_mask
are complete squares (or cubes), sinces itsmask
are only sampled on the central parts. - in the paper, it said the channels use different
mask
s, while in your implement they use the same.
I just figure them out, actually I do not know whether they are effective tricks, there are insufficient details discussed in the paper :)
The gamma issue is a minor thing but I can have a look at it.
The channels share the same mask in the paper.
“We experimented with a shared DropBlock mask across different feature channels or each feature channel has its DropBlock mask. Algorithm 1 corresponds to the latter, which tends to work better in our experiments.” (page 2 bottom line)
Sure, that is easily fixable
Expect it soon
Edit: you can also do a PR if you want
Hi, Any updates on this? Best
Hi, Any updates on this? Best
I haven't had much free time to deal with this, but I will review and accept merge requests
I also found some difference between paper and code.
To solve this issue, you could have a look at this folk(only for DropBlock2D)
To solve this issue, you could have a look at this folk(only for DropBlock2D)
I would encourage you to do a pull request
If you do look at the code linked above, note that mask_center is not initialized on the device, so the part where nn.ZeroPad2d is called will by default run on the CPU. For me, since I was training on a GPU, this slowed down a single forward call (of my model which uses many Dropblocks) from .15 seconds to 3 seconds.
data:image/s3,"s3://crabby-images/4dc73/4dc73a1d89b066bdb81d23a5f27268efae13d439" alt="Screen Shot 2022-01-24 at 11 03 11 PM"