Perceiver_VL
Perceiver_VL copied to clipboard
Query about combining modality indicator
Hi,
Thanks for making your work open source! From the paper, I understand that you are adding all the four embeddings (modality, temporal, positional, patch/token) [Section 3.1]. However in the codebase, from what I understood, you are concatenating the modality indicator with the sum of other three [Here]. I might be missing something basic, please let me know!
Thanks and Regards, Rutav.