Nouri Alexander Hilscher
Nouri Alexander Hilscher
As a quick fix (or another solution that goes into a different direction) that I had to come up with because I had to make it work in an environment...
As far as I know, I would recommend using the patch tokens for your case. The final class token characterizes the image as a complete entity, giving you the ability...
I think the patch tokens should suffice, but I didn't try to include the class tokens. If you prepend them to your patch embeddings by concatenating, I would assume that...
This seems like an issue not related to RAM, but rather to GroundingDINO/Grounding SAM. There is a version of the Multi-Scale Deformable Attention Function that this library attempts to build...