marian-dev
marian-dev copied to clipboard
Interoperability of models between framework
Feature description
From https://huggingface.co/transformers/model_doc/marian.html, there's instruction on how to used the pretrained model.
Is there an official doc from Marian's end to convert a Marian model to Huggingface's interface?
Example
$ ls model-dir
model.npz
vocab.src.spm
vocab.trg.spm
model.npz.decoder.yml
$ cd model-dir
$ marian-interop \
--src-vocab vocab.src.spm --trg-vocab vocab.trg.spm \
--model-binary model.npz --model-config model.npz.decoder.yml
--option marian2huggingface
--output-dir model-dir-huggingface
Would be even more exciting if it can be converted to https://github.com/pytorch/fairseq or even something like https://github.com/awslabs/sockeye or https://github.com/OpenNMT/OpenNMT-py
Hi, the conversion script on huggingface is their implementation. I did help a bit, but we do not really provide conversions for other frameworks. Mostly because we don't use them that much, so the motivation for us is a bit lower. We are happy to accept conversion scripts to be included here if provided.
@sshleifer and someone from @Helsinki-NLP / @jorgtied might be able to do some contribution.
Instructions for converting a Tatoeba-Challenge (marian model) to huggingface. https://github.com/sshleifer/transformers_fork/blob/46509d1c19b9e69d75fb95d33d38dbac4f6f8858/scripts/tatoeba/README.md#L30-L30
The convert function does the heavy lifting: https://github.com/huggingface/transformers/blob/master/src/transformers/convert_marian_to_pytorch.py#L567
Hello,
Would be even more exciting if it can be converted to https://github.com/pytorch/fairseq or even something like https://github.com/awslabs/sockeye or https://github.com/OpenNMT/OpenNMT-py
On this subject, we added a converter for Marian models in CTranslate2 which is the OpenNMT inference framework. There is an example on how to convert an OPUS-MT model here.