Does the SAM2 implementation need any code modifications to handle different input resolutions?

Open GoldenFishes opened this issue 1 year ago • 1 comments

Thank you so much for your great work! I would like to know if I need to modify the SAM2 code to handle different input resolutions. Which parts of the code should I modify? Also, what is the actual speed difference for different resolutions?

Dec 02 '24 02:12 GoldenFishes

Thanks for checking out the repo!

For the original SAM2 code, the video predictor already supports resolution changes without modifying the code, you just need to change the image_size config parameter. The image predictor does require some (small) changes to make the bb_feat_sizes parameter dynamic. There's more of a description in issue #257.

There's about a 4x speed up going from 1024 down to 512px. Unfortunately, the SAM v2 models don't handle the resolution change very well, so doing intermediate resolutions (e.g 768px) to get better speed/accuracy tradeoff usually isn't an option.

Dec 02 '24 12:12 heyoeyo