UER-py
UER-py copied to clipboard
When using sentencepiece, a Segmentation fault is reported
Word-based pretraining with sentencepiece
python3 preprocess.py --corpus_path corpora/book_review.txt \
--spm_model_path models/cluecorpussmall_spm.model \
--dataset_path book_review_word_sentencepiece_dataset.pt \
--processes_num 8 --seq_length 128 --dynamic_masking \
--data_processor mlm
python3 pretrain.py --dataset_path book_review_word_sentencepiece_dataset.pt \
--spm_model_path models/cluecorpussmall_spm.model \
--output_model_path models/book_review_word_sentencepiece_model.bin \
--world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
--total_steps 5000 --save_checkpoint_steps 2500 --report_steps 500 \
--learning_rate 1e-4 --batch_size 64 \
--tie_weights
Report the following error
Segmentation fault