marian-dev icon indicating copy to clipboard operation
marian-dev copied to clipboard

Documentation of BERT command line options

Open tomsbergmanis opened this issue 3 years ago • 4 comments

Marian command line options include: --bert-mask-symbol TEXT=[MASK] Masking symbol for BERT masked-LM training --bert-sep-symbol TEXT=[SEP] Sentence separator symbol for BERT next sentence prediction training --bert-class-symbol TEXT=[CLS] Class symbol BERT classifier training --bert-masking-fraction FLOAT=0.15 Fraction of masked out tokens during training --bert-train-type-embeddings=true Train bert type embeddings, set to false to use static sinusoidal embeddings --bert-type-vocab-size INT=2 Size of BERT type vocab (sentence A and B) Yet I can not find example of how and for what these can be used. It would be good to have simple use case example on how to use these to do BERT pretraining.

tomsbergmanis avatar Oct 20 '21 10:10 tomsbergmanis

Even a code example in a comment below would be really good!

tomsbergmanis avatar Oct 21 '21 12:10 tomsbergmanis

Hi Toms, I've never used it and don't know what it actually does. Pinging @emjotde, who may know. It might be that it was an experimental feature.

snukky avatar Oct 27 '21 10:10 snukky