FeatUp
FeatUp copied to clipboard
Question about input resolution and text encoder
Hi, thank you for the great work! I have two questions about upsampling my features:
- My images are not square. Should I resize the image to 224x224, or can I resize it to 224x332 to maintain the aspect ratio of height and width? I noticed that the module's training size is 224x224 in #39.
- When I upsample the clip features, which text encoder should I use? Can I load a clip text decoder independently, or do I need to use the text decoder from the FeatUP feature backbone? How can the text decoder be obtained for the "maskclip" and "clip" from the featup?