FlagEmbedding
FlagEmbedding copied to clipboard
pretrain
Is there away to pretrain the M_3 models?
You can pretrain m3 following this example: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain
@staoxiao would the same script work ?
yes, m3 and other models share the same pretraining script
/usr/local/lib/python3.10/dist-packages/transformers/data/data_collator.py:1019: UserWarning: DataCollatorForWholeWordMask is only suitable for BertTokenizer-like tokenizers. Please refer to the documentation for more information. should i just ignore this then