FPT
FPT copied to clipboard
Implementation for paper: Feature Pyramid Transformer
May I ask why you perform rending transformer by the entire feature map instead of pixels? Does it work well if you do rendering transformer the same way for grounding...
Could you please share your trained models
Dear author, Thanks for your work. When I input the feature maps with sizes of [torch.Size([1, 256, 320, 208]), torch.Size([1, 512, 160, 104]), torch.Size([1, 1024, 80, 52]), torch.Size([1, 2048, 40,...