HiSultryMan
Results
2
issues of
HiSultryMan
In your demo code, dim of q is 64 while dim of RotaryEmbedding is 32. I checked the code, q with position index larger than 32 will not be rotate...
Can we just use text as input to enforce the joint learning of image appearance, spatial relationship, and geometry in a unified network?