academic-budget-bert issues

The training process will get stuck after training for one epoch

10

Hi, @peteriz seems like there is an issue if deleting the line global_rank = 0. With different worker reading different shard, the total num of iteration for each worker in...

leoozy

What is the size of the processed data？

1

Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether...

leoozy

the eval_acc on RTE dataset is only 55%

1

Hello, thank you for your code. I tired to run your code with the following commond: aim=pretraining_experiment-bert-mlm--23000 deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 64000 run_pretraining.py \ --model_type bert-mlm --tokenizer_name bert-base-uncased \ --hidden_act gelu...

leoozy

Distributed pretraining dataset question

3

https://github.com/IntelLabs/academic-budget-bert/blob/ea000838156e3be251699ad6a3c8b1339c76e987/pretraining/dataset/distributed_pretraining_dataset.py#L280 In the above line, the global_rank is set to 0 for all workers, meaning that the function will return the same file_index for all the workers. If world_size =...

sangmichaelxie

Finetuning commands for other glue tasks

1

HI, Can you share what finetuning commands you used for other glue tasks? Did you use the same warmup, hyperparameters etc as for the example MRPC command you shared?

raghavlite

only test_shard_*.hdf5

1

Hi after running ``` python generate_samples.py \ --dir ./enwiki_books_shards_merge \ -o ./enwiki_books_samples \ --dup_factor 10 \ --seed 42 \ --vocab_file ./vocab.txt \ --do_lower_case 1 \ --masked_lm_prob 0.15 \ --max_seq_length 128...

shizhediao

GLUE dev results

1

Is it possible for you to show your GLUE development results in the repos README file? In this case, we can use it as a baseline without submitting to the...

BaohaoLiao

Segment article into sentences using multiprocessing queue

### Segment article into sentences using multiprocessing queue **Logic** - Divide the data evenly among the number of processes. - Each child process takes the chunk of data and processes...

amitkvikram

The file produced by process_data.py is empty

Thanks for your awesome work and detailed README! However, when I perform preprocessing with process_data.py, the output directory and file `wiki_one_article_per_line.txt` is empty. I think the input file of process_data.py...

Richar-Du

Bump transformers from 4.4.0 to 4.30.0

Bumps [transformers](https://github.com/huggingface/transformers) from 4.4.0 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...

dependabot[bot]

dependencies

academic-budget-bert
academic-budget-bert copied to clipboard

Metadata

The training process will get stuck after training for one epoch

What is the size of the processed data？

the eval_acc on RTE dataset is only 55%

Distributed pretraining dataset question

Finetuning commands for other glue tasks

only test_shard_*.hdf5

GLUE dev results

Segment article into sentences using multiprocessing queue

The file produced by process_data.py is empty

Bump transformers from 4.4.0 to 4.30.0

← Metadata

Owner

Metadata

academic-budget-bert academic-budget-bert copied to clipboard

Metadata

← Metadata

Owner

Metadata

academic-budget-bert
academic-budget-bert copied to clipboard