askerlee

Results 61 comments of askerlee

**EDIT**: Oh sorry for my mistake. Now I checked the "faster-rcnn" branch and it works! Thanks @nils489 > Probably @nils489 forked the wrong branch...

@ZhouYanzhao oh I guess I understand your words eventually. "N-directional points" means an activation value at (x,y), isn't it? The activation value is a scalar for canonical conv filters, but...

@fding Thanks for the nice reply. Wonder how it was evaluated on sintel and KITTI? Perceiver IO seems not to appear in both leaderboards. Did you have internal channels for...

Thank you @fding ! This is very helpful.

BTW the transformer layers used by huggingface's vit is basically a verbatim copy of the transformer used in their Bert model.

I also checked rwightman's pytorch-image-models. The vision transformer he implemented has all the dropouts. The dropout rate is 0.1, the same as yours.

Won't have big impact, but maybe some fraction of a point. I don't have enough computation resources to find out...

This is weird. (predicted noise -> x_0 -> x_t-1) uses eq. 9, and your implementation uses eq.10. I've verified and they are mathematically equivalent. I'd suggest you to check the...

What's the dataset? It's very hard to compare at an early stage. Some methods converge faster but to a less optimal state.

One thing to note is that T.InterpolationMode.* seem to be equal to Image.*. So we may ignore this warning safely (at least for now).