academic-budget-bert icon indicating copy to clipboard operation
academic-budget-bert copied to clipboard

Repository containing code for "How to Train BERT with an Academic Budget" paper

Results 11 academic-budget-bert issues
Sort by recently updated
recently updated
newest added

Hi, @peteriz seems like there is an issue if deleting the line global_rank = 0. With different worker reading different shard, the total num of iteration for each worker in...

Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether...

Hello, thank you for your code. I tired to run your code with the following commond: aim=pretraining_experiment-bert-mlm--23000 deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 64000 run_pretraining.py \ --model_type bert-mlm --tokenizer_name bert-base-uncased \ --hidden_act gelu...

https://github.com/IntelLabs/academic-budget-bert/blob/ea000838156e3be251699ad6a3c8b1339c76e987/pretraining/dataset/distributed_pretraining_dataset.py#L280 In the above line, the global_rank is set to 0 for all workers, meaning that the function will return the same file_index for all the workers. If world_size =...

HI, Can you share what finetuning commands you used for other glue tasks? Did you use the same warmup, hyperparameters etc as for the example MRPC command you shared?

Hi after running ``` python generate_samples.py \ --dir ./enwiki_books_shards_merge \ -o ./enwiki_books_samples \ --dup_factor 10 \ --seed 42 \ --vocab_file ./vocab.txt \ --do_lower_case 1 \ --masked_lm_prob 0.15 \ --max_seq_length 128...

Is it possible for you to show your GLUE development results in the repos README file? In this case, we can use it as a baseline without submitting to the...

### Segment article into sentences using multiprocessing queue **Logic** - Divide the data evenly among the number of processes. - Each child process takes the chunk of data and processes...

Thanks for your awesome work and detailed README! However, when I perform preprocessing with process_data.py, the output directory and file `wiki_one_article_per_line.txt` is empty. I think the input file of process_data.py...

Bumps [transformers](https://github.com/huggingface/transformers) from 4.4.0 to 4.30.0. Release notes Sourced from transformers's releases. v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone 100k Transformers has just reached 100k stars...

dependencies