kraken icon indicating copy to clipboard operation
kraken copied to clipboard

Unsupervised pre-training of recognizer

Open mittagessen opened this issue 2 years ago • 0 comments

This is an implementation for unsupervised pretraining of recognition model weights based on an image inpainting surrogate task that aims to reconstruct randomly sampled masked patches from the initial convolutional feature maps that have been replaced with a learnable embedding. The model is trained with a contrastive loss where negative samples are randomly generated from the unmasked parts of the sequence.

Apart from the sampling method the implementation is mostly a faithful adaptation of:

Vogler, Nikolai, et al. "Lacuna Reconstruction: Self-supervised Pre-training
for Low-Resource Historical Document Transcription." arXiv preprint
arXiv:2112.08692 (2021).

Status of the PR

  • [X] Dataset support
  • [X] Wav2Vec2 masking layer
  • [X] Pytorch-lightning model
  • [X] CLI driver
  • [X] Auxiliary layers in VGSL model
  • [ ] Deserialization of auxiliary layers
  • [ ] Docs

mittagessen avatar Jun 11 '22 00:06 mittagessen