kraken
kraken copied to clipboard
Unsupervised pre-training of recognizer
This is an implementation for unsupervised pretraining of recognition model weights based on an image inpainting surrogate task that aims to reconstruct randomly sampled masked patches from the initial convolutional feature maps that have been replaced with a learnable embedding. The model is trained with a contrastive loss where negative samples are randomly generated from the unmasked parts of the sequence.
Apart from the sampling method the implementation is mostly a faithful adaptation of:
Vogler, Nikolai, et al. "Lacuna Reconstruction: Self-supervised Pre-training
for Low-Resource Historical Document Transcription." arXiv preprint
arXiv:2112.08692 (2021).
Status of the PR
- [X] Dataset support
- [X] Wav2Vec2 masking layer
- [X] Pytorch-lightning model
- [X] CLI driver
- [X] Auxiliary layers in VGSL model
- [ ] Deserialization of auxiliary layers
- [ ] Docs