satflow icon indicating copy to clipboard operation
satflow copied to clipboard

New paper: PolyViT

Open jacobbieker opened this issue 3 years ago • 2 comments

Detailed Description

https://arxiv.org/abs/2111.12993

Seems similar to Perceiver, and they mention Perceiver as a related model, but their training is a bit different.

But it is still a multi modal model that could also be a good one to try

Context

Possible Implementation

jacobbieker avatar Nov 29 '21 22:11 jacobbieker

Very cool! Lots of interesting details about training.

The very last sentence of the paper confuses me though:

We also do not currently fuse multiple modalities together (ie video and audio) to make a better prediction, and aim to do so in future.

JackKelly avatar Nov 30 '21 07:11 JackKelly

Yeah, I mostly skimmed it to be honest, but it seemed like they had one network they trained on multiple tasks with various inputs, but it does each of those separately, vs Perceiver doing it all together

jacobbieker avatar Nov 30 '21 09:11 jacobbieker