adapters
adapters copied to clipboard
The convergence problem about xlmrobert-base in Socialiqa
Environment info
-
adapter-transformers
version: 2.0.0 - Platform: Linux
- Python version: 3.7.7
- PyTorch version (GPU?): GPU
Details
I can find a suitable learning rate on the socialiqa task when i fine-tune on xlmroberta, but when I trained the adapter on socialiqa, I cannot find a suitable learning rate or matching hyperparameters to converge the task. I tried all possible learning rates of e-4, e-5, -e6, using the adam/adamw optimizer. The code i used is that:
self.bert.model.add_adapter('mix_all',config=adapter_config) enc_hidden_size = self.bert.model.embeddings.word_embeddings.weight.shape[ 1] self.score_cal=Final_layer(enc_hidden_size,std=args.adapter_initializer_range) for i in self.score_cal.parameters(): i.requires_grad=True self.bert.model.train_adapter('mix_all')
---score_cal are two linear layers (W(tanh(Wx+b))initialized by nn.linear
Hi @probe2,
Sorry for the late answer, does this problem still persist?
If yes, which performances do you get when training an adapter on Social IQA, i.e. how big is the performance difference between full fine-tuning and training adapters? A slight performance decrease when switching from fine-tuning to adapter training is common. Regarding learning rates, values around 1e-4
usually work well from experience. However, you would usually train for more epochs, e.g. 10-15 instead of 3 for full fine-tuning.
hi, @calpt I tried different learning rates and epochs, but the xlm-roberta failed to converge. Failure to converge means that the training loss of the model has not decreased, so the performance of the model has always been random, here it is 33%.
I only added the task adapter, and did not add the language adapter, but it stands to reason that if only the task adapter is added, the model should also be able to converge. Can you try it?
Thanks you very much.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
This issue was closed because it was stale for 14 days without any activity.