transformers
transformers copied to clipboard
distillation training for arabic langauge
System Info
I encountered two issues while attempting to run the binarized_data.py
and train.py
scripts for the Knowledge Distillation of BERT Language Model on the Arabic Language project. Below are the details of each issue:
-
In the
binarized_data.py
script, I had to modify line 83 to make it work. The original line is:dp_file = f"{args.dump_file}.{args.tokenizer_name}.pickle"
However, I had to remove the
tokenizer_name
variable and change the line to:dp_file = f"{args.dump_file}.pickle"
This change was necessary because the Arabic BERT model name, "asafaya/bert-large-arabic," contains a forward slash ("/"), which caused errors when concatenating it with the
tokenizer_name
variable. -
In the
train.py
script, I made a modification on line 258. The original line is:args.max_model_input_size = tokenizer.max_model_input_sizes[args.teacher_name]
However, I had to change it to:
args.max_model_input_size = tokenizer.max_model_input_sizes['bert-large-uncased']
This modification was necessary because I am using different model configurations than those listed in the folder. It would be helpful if the script could be modified to automatically work with the intended config, allowing for more flexibility.
Apart from these script modifications, I made the necessary changes to the config files to match the different models I am using. this is understood as I am using a model with a different config than the one listed in the folder, maybe we can modify the script to download and locate the necessary config file automatically.
Please let me know if there are any further clarifications needed or if you require additional information to address these issues.
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
here is link to the Google colab that has the problem https://colab.research.google.com/drive/1OqSvRNMl0-Z7ScCd6hLbPHMO-ZXT3WEw?usp=sharing
Expected behavior
the model has to start the training smoothly and the script has to be able to handle the model names which contains '/'
Please use the forums for such questions. This is not a maintained example.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.