academic-budget-bert icon indicating copy to clipboard operation
academic-budget-bert copied to clipboard

only test_shard_*.hdf5

Open shizhediao opened this issue 2 years ago • 1 comments

Hi after running

python generate_samples.py \
    --dir ./enwiki_books_shards_merge \
    -o ./enwiki_books_samples \
    --dup_factor 10 \
    --seed 42 \
    --vocab_file ./vocab.txt \
    --do_lower_case 1 \
    --masked_lm_prob 0.15 \
    --max_seq_length 128 \
    --model_name bert-large-uncased \
    --max_predictions_per_seq 20 \
    --n_processes 32

I only got test_shard_.hdf5 in the ./enwiki_books_samples. No train_shard_.hdf5. Do you have any ideas? Thanks!

shizhediao avatar Apr 12 '22 05:04 shizhediao

Just rename according to the ratio you need. That's how I solved it.

marcelbra avatar Apr 21 '22 12:04 marcelbra