GroundingDINO Is it possible to generate embeddings that can be queried later

Title says it pretty much. In SAM, they have the idea of encode-once, decode anytime later, which helps engineer systems around it. Can something similar be implementing in GroundingDINO? Can the encoder's embeddings be cached such that it can be decoded and matched against an input-text-prompt at runtime?

Jun 06 '23 05:06 rsnk96

Thanks for your valuable question.

I believe it can embedding features only on Grounding DINO, which may need to modify the code now. We will try to support this in later updates. Or it will be helpful if you'd like to join us by providing PRs.

Jun 07 '23 19:06 SlongLiu

Sure, I'd like to try contributing this. Can you recommend which section of the codebase might be a good starting point to look into this?

Jun 08 '23 11:06 rsnk96

Hey was this done ?

Jul 01 '24 01:07 yashtomar31

Do we have any updates on this by any chance, @SlongLiu ? I would be interested in working on this as well - can you possibly guide me on where one could start?

Mar 31 '25 20:03 JamesEmi