Adding to an existing language model
If I wanted to add to an existing model how would I do that? I have topic specific language from scientific domains that I would like to add.
I didn't see anything in the open or closed tickets.
Thanks!
I have the same question. I do not know whether this is currently achievable or not, but the question seems like a duplicate of https://github.com/argosopentech/argos-train/issues/12 (which says it is not straightforward). From a mathematical point of view, this is doable as it is a transformer model. From a software point of view, it is already done in some fields, see for instance:
- https://gmihaila.github.io/tutorial_notebooks/finetune_transformers_pytorch/
- https://medium.com/@lokaregns/fine-tuning-transformers-with-custom-dataset-classification-task-f261579ae068
I believe there is at least one strategy:
- taking the initial aligned bilingual corpus of the existing model
- adding your aligned documents to that corpus
- retraining the model on the new corpus.
This would require to add local data, see https://github.com/argosopentech/argos-train/issues/24
Duplicate of #12