pixel icon indicating copy to clipboard operation
pixel copied to clipboard

Decoder only text reconstruction

Open AmitMY opened this issue 3 years ago • 1 comments

In your work, it seems like the decoder is only used to pre-train the model, and then discarded for downstream tasks, and only the encoder is being used.

Is there a way for the decoder to predict text, and get back to the original text unicode representation? Is an OCR model (that can be trained in conjunction perhaps) required?

AmitMY avatar Jul 15 '22 06:07 AmitMY

Currently, it is not possible to go from reconstructed image patches to unicode text, so this is an open problem (we mentioned this briefly in our paper's limitations section). An OCR decoder could work for sure, and we're planning to investigate possible solutions for this in the future

xplip avatar Jul 15 '22 15:07 xplip