MetaTransformer
MetaTransformer copied to clipboard
Meta-Transformer for Unified Multimodal Learning
In the sample code provided, features are concated before processed in the encoder. features = torch.concat([video_tokenizer(video), audio_tokenizer(audio), time_series_tokenizer(time_data)],dim=1) However, as I ran some tokenizers of different modaility, the tokenized shape...
First of all, congratulations for your work! I opened this issue to ask if you can upload the Data2Seq pre-trained weights, it could be very useful for many researchers. Thanks...
Hi, thanks for your outstanding work! I am trying to use meta-transformer to conduct image classification. I noticed that in the paper, you wrote "On image classification, with the help...