marian-dev icon indicating copy to clipboard operation
marian-dev copied to clipboard

Interoperability of models between framework

Open sianvolta opened this issue 5 years ago • 5 comments

Feature description

From https://huggingface.co/transformers/model_doc/marian.html, there's instruction on how to used the pretrained model.

Is there an official doc from Marian's end to convert a Marian model to Huggingface's interface?

Example

$ ls model-dir
model.npz
vocab.src.spm
vocab.trg.spm 
model.npz.decoder.yml

$ cd model-dir 

$ marian-interop \
    --src-vocab vocab.src.spm --trg-vocab vocab.trg.spm \
    --model-binary model.npz --model-config model.npz.decoder.yml
    --option marian2huggingface 
    --output-dir model-dir-huggingface

sianvolta avatar Nov 10 '20 01:11 sianvolta

Would be even more exciting if it can be converted to https://github.com/pytorch/fairseq or even something like https://github.com/awslabs/sockeye or https://github.com/OpenNMT/OpenNMT-py

sianvolta avatar Nov 10 '20 01:11 sianvolta

Hi, the conversion script on huggingface is their implementation. I did help a bit, but we do not really provide conversions for other frameworks. Mostly because we don't use them that much, so the motivation for us is a bit lower. We are happy to accept conversion scripts to be included here if provided.

emjotde avatar Nov 10 '20 02:11 emjotde

@sshleifer and someone from @Helsinki-NLP / @jorgtied might be able to do some contribution.

sianvolta avatar Nov 10 '20 11:11 sianvolta

Instructions for converting a Tatoeba-Challenge (marian model) to huggingface. https://github.com/sshleifer/transformers_fork/blob/46509d1c19b9e69d75fb95d33d38dbac4f6f8858/scripts/tatoeba/README.md#L30-L30

The convert function does the heavy lifting: https://github.com/huggingface/transformers/blob/master/src/transformers/convert_marian_to_pytorch.py#L567

sshleifer avatar Nov 10 '20 14:11 sshleifer

Hello,

Would be even more exciting if it can be converted to https://github.com/pytorch/fairseq or even something like https://github.com/awslabs/sockeye or https://github.com/OpenNMT/OpenNMT-py

On this subject, we added a converter for Marian models in CTranslate2 which is the OpenNMT inference framework. There is an example on how to convert an OPUS-MT model here.

guillaumekln avatar Mar 04 '22 14:03 guillaumekln