3detr
3detr copied to clipboard
why do you perform down sampling after the first layer in 3detr-m, rather than the whole encoder?
Would this operation leads to performance drop? or because of the computational cost?
We followed PointNet++ for this design decision, where the downsampling is performed after the first layer. In initial experiments, directly downsampling gave worse results.
I mean, "PointNetSA -> Encoder -> Encoder -> Encoder -> DownSampling", rather than "PointNetSA -> Encoder -> DownSampling -> Encoder -> Encoder". Since it's known that DownSampling in PointNet++ loses information, "PointNetSA -> DownSampling -> Encoder -> Encoder -> Encoder" would not be a good choice.