Mimick
Mimick copied to clipboard
Transformer Models
Hi, is it possible to integrate it with transformer-based models, such as a variation of BERT?
Hi, can you please give more details? Are you referring to replacing the Mimick LSTM with a transformer, or applying the Mimick idea within a BERT-like model?
For the latter, this would include solving some matters which are far from trivial, such as pre-training MLM objective and multi-token words. One solution is recorded in this preprint, for which code release is still unfortunately delayed.