yoloe
yoloe copied to clipboard
How to convert semantic features into prompt embeddings
Thank you very much for your work. I have some questions to ask. At the end of section 3.3, we can obtain a semantic feature of D * H * W. How was it converted into a visual prompt embedding of C * D?