bergamot-translator icon indicating copy to clipboard operation
bergamot-translator copied to clipboard

QE - Distilled model

Open felipesantosk opened this issue 2 years ago • 1 comments

This PR is to track the QE distilled model port from deepQuest to marian.

PyTorch BiRNN model:

BiRNN(
  (_text_field_embedder_src): BasicTextFieldEmbedder(
    (token_embedder_tokens): Embedding()
  )
  (_text_field_embedder_tgt): BasicTextFieldEmbedder(
    (token_embedder_tokens): Embedding()
  )
  (seq2seq_encoder_src): GruSeq2SeqEncoder(
    (_module): GRU(50, 50, batch_first=True, bidirectional=True)
  )
  (seq2seq_encoder_tgt): GruSeq2SeqEncoder(
    (_module): GRU(50, 50, batch_first=True, bidirectional=True)
  )
  (attention): DotProductAttention()
  (_linear_layer_src): Linear(in_features=100, out_features=100, bias=True)
  (_linear_layer_tgt): Linear(in_features=100, out_features=100, bias=True)
  (_dropout): Dropout(p=0.5, inplace=False)
  (_linear_layer): Linear(in_features=200, out_features=1, bias=True)
  (_loss): MSELoss()
)

Port tasks:

Related Marian PR - https://github.com/browsermt/marian-dev/pull/76

felipesantosk avatar Mar 01 '22 18:03 felipesantosk

@felipesantosk Thank you for opening the requested PRs. I have the following suggestions:

Remove the hardcodes for paths from C++ and Python. In C++ use CLI parsing (there should be a variant of CLI11 includable). In Python you should be able to use argparse.

Both in place, add an additional shell script which will serve as documentation to run both (Python check scripts, C++ converter) and report differences. The shell-script may fetch the Python (deepquest) model and using it then write out the .zips requested in https://github.com/felipesantosk/bergamot-translator/issues/2#issuecomment-1055777416 making the process ahead easier.

It should make running easy for the reviewers in case more hands-on help is required and provide a possibility for attaching a check via GitHub Actions.

Bear in mind you can parallelize development - if you're stuck waiting on inputs from me or @graemenail you can unit test port of Linear Layer etc by means of random inputs (PyTorch should allow you to pick select tensors or nn.Modules). If the units work, the whole should work.

I think the different SentencePiece might run into potential trouble with marian's equivalent SentencePiece vocab (because some parameters are hardcoded in some way) - but we should be able to fix that eventually.

jerinphilip avatar Mar 04 '22 15:03 jerinphilip

Ended up not being used, sorry about that.

XapaJIaMnu avatar Jul 31 '23 14:07 XapaJIaMnu