academic-budget-bert
academic-budget-bert copied to clipboard
only test_shard_*.hdf5
Hi after running
python generate_samples.py \
--dir ./enwiki_books_shards_merge \
-o ./enwiki_books_samples \
--dup_factor 10 \
--seed 42 \
--vocab_file ./vocab.txt \
--do_lower_case 1 \
--masked_lm_prob 0.15 \
--max_seq_length 128 \
--model_name bert-large-uncased \
--max_predictions_per_seq 20 \
--n_processes 32
I only got test_shard_.hdf5 in the ./enwiki_books_samples
. No train_shard_.hdf5.
Do you have any ideas?
Thanks!
Just rename according to the ratio you need. That's how I solved it.