BERT4doc-Classification
BERT4doc-Classification copied to clipboard
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there...
I got this error when doing further-pretraining my environment Ubuntu 18.04.4 LTS (GNU/Linux 5.4.0-74-generic x86_64) GPU 2080ti I use following command python run_pretraining.py \ --input_file=./tmp/tf_AGnews.tfrecord \ --output_dir=./uncased_L-12_H-768_A-12_AGnews_pretrain \ --do_train=True \...
Hi, I followed ur code to further pre-train a bert model on my own corpus but I got only checkpoint files without any config or vocab.txt file any ideas plz?...
Hi, first, thank u for having sharing ur cod with us I am trying to further pretraining a bert model on my own corpus on colab gpu but I am...
Dear Yige, thanks a lot for sharing the code! I was wondering if you could provide some more detail on "further pre-training" on the IMDB dataset, e.g. the hyperparameter settings...
Hi, thanks for your great work. While running run_pretraining.py, I kept getting OOM for any size of the matrix. I already reduce the batch size to 1 but didn't help....
Thanks for your hard work! I have two questions. First, for Layer-wise Decreasing Layer Rate, did you use a warm-up or polynomial_decay simultaneous?,and it means that warm-up rate and Layer-wise...
Hi sorry to bother you, but I have one question. Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and...
The parser option for save_checkpoints_steps doesnt do anything for me. Im running: `python3 run_classifier_single_layer.py --task_name imdb --do_train --do_eval --do_lower_case --data_dir ./stock --vocab_file ./uncased_L-12_H-768_A-12/vocab.txt --bert_config_file ./uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint ./uncased_L-12_H-768_A-12/pytorch_model.bin --max_seq_length 512 --train_batch_size...
In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5."...