ovdet
ovdet copied to clipboard
About pseudo words
Hi, Thanks for sharing such a great work exploiting the local scene structure inherently captured by VLMs, a very novel perspective.
I have one confusing question after reading the paper. In fig.2 (a), a linear layer maps a region feature into pseudo words. So what are pseudo words actually? Are they a bunch of word embeddings or just indices of words within a vocabulary?