argos-train Adding to an existing language model

If I wanted to add to an existing model how would I do that? I have topic specific language from scientific domains that I would like to add.

I didn't see anything in the open or closed tickets.

Thanks!

Oct 01 '23 23:10 mbachtell

I have the same question. I do not know whether this is currently achievable or not, but the question seems like a duplicate of https://github.com/argosopentech/argos-train/issues/12 (which says it is not straightforward). From a mathematical point of view, this is doable as it is a transformer model. From a software point of view, it is already done in some fields, see for instance:

https://gmihaila.github.io/tutorial_notebooks/finetune_transformers_pytorch/
https://medium.com/@lokaregns/fine-tuning-transformers-with-custom-dataset-classification-task-f261579ae068

I believe there is at least one strategy:

taking the initial aligned bilingual corpus of the existing model
adding your aligned documents to that corpus
retraining the model on the new corpus.

This would require to add local data, see https://github.com/argosopentech/argos-train/issues/24

Nov 04 '23 13:11 mayeulk

Duplicate of #12

Nov 04 '23 13:11 mayeulk