T-Rex icon indicating copy to clipboard operation
T-Rex copied to clipboard

About Visual Prompt Encoder.

Open fuweifu-vtoo opened this issue 6 months ago • 3 comments

Dear author, I have another question for you:

In Visual Prompt Encoder, is it stacking three layers of deformable cross-attention layer, then connecting one self attention and one FFN?

Or stacking three blocks of (Deformable cross attention + self attention + FFN)

fuweifu-vtoo avatar Aug 09 '24 09:08 fuweifu-vtoo