FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

pretrain

Open drewskidang opened this issue 1 year ago • 2 comments

Is there away to pretrain the M_3 models?

drewskidang avatar May 18 '24 18:05 drewskidang

You can pretrain m3 following this example: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain

staoxiao avatar May 19 '24 08:05 staoxiao

@staoxiao would the same script work ?

drewskidang avatar May 19 '24 20:05 drewskidang

yes, m3 and other models share the same pretraining script

staoxiao avatar May 20 '24 11:05 staoxiao

/usr/local/lib/python3.10/dist-packages/transformers/data/data_collator.py:1019: UserWarning: DataCollatorForWholeWordMask is only suitable for BertTokenizer-like tokenizers. Please refer to the documentation for more information. should i just ignore this then

drewskidang avatar May 21 '24 15:05 drewskidang