FeatUp Question about input resolution and text encoder

Question about input resolution and text encoder

Open Chuan-10 opened this issue 10 months ago • 0 comments

Hi, thank you for the great work! I have two questions about upsampling my features:

My images are not square. Should I resize the image to 224x224, or can I resize it to 224x332 to maintain the aspect ratio of height and width? I noticed that the module's training size is 224x224 in #39.
When I upsample the clip features, which text encoder should I use? Can I load a clip text decoder independently, or do I need to use the text decoder from the FeatUP feature backbone? How can the text decoder be obtained for the "maskclip" and "clip" from the featup?

Apr 05 '24 02:04 Chuan-10