satflow
satflow copied to clipboard
New paper: PolyViT
Detailed Description
https://arxiv.org/abs/2111.12993
Seems similar to Perceiver, and they mention Perceiver as a related model, but their training is a bit different.
But it is still a multi modal model that could also be a good one to try
Context
Possible Implementation
Very cool! Lots of interesting details about training.
The very last sentence of the paper confuses me though:
We also do not currently fuse multiple modalities together (ie video and audio) to make a better prediction, and aim to do so in future.
Yeah, I mostly skimmed it to be honest, but it seemed like they had one network they trained on multiple tasks with various inputs, but it does each of those separately, vs Perceiver doing it all together