bergamot-translator QE - Distilled model

This PR is to track the QE distilled model port from deepQuest to marian.

PyTorch BiRNN model:

BiRNN(
  (_text_field_embedder_src): BasicTextFieldEmbedder(
    (token_embedder_tokens): Embedding()
  )
  (_text_field_embedder_tgt): BasicTextFieldEmbedder(
    (token_embedder_tokens): Embedding()
  )
  (seq2seq_encoder_src): GruSeq2SeqEncoder(
    (_module): GRU(50, 50, batch_first=True, bidirectional=True)
  )
  (seq2seq_encoder_tgt): GruSeq2SeqEncoder(
    (_module): GRU(50, 50, batch_first=True, bidirectional=True)
  )
  (attention): DotProductAttention()
  (_linear_layer_src): Linear(in_features=100, out_features=100, bias=True)
  (_linear_layer_tgt): Linear(in_features=100, out_features=100, bias=True)
  (_dropout): Dropout(p=0.5, inplace=False)
  (_linear_layer): Linear(in_features=200, out_features=1, bias=True)
  (_loss): MSELoss()
)

Port tasks:

[x] Source embbeding - (PR Closed)
[x] Source Seq2seq encoder - (PR Closed)
[x] Source linear layer - (PR Closed)
[x] Source attention - (PR Closed)
[x] Source weighted sum - PR Closed
[x] Replicate above operations for the target tokens - PR Closed
[x] Concatenates the encoded source and target - PR Closed
[x] Apply sequeze, linear layer and sigmoid - PR Closed
[ ] Work in batch - (In Progress)
[ ] Port the same python vocab to marian

Related Marian PR - https://github.com/browsermt/marian-dev/pull/76

Mar 01 '22 18:03 felipesantosk

@felipesantosk Thank you for opening the requested PRs. I have the following suggestions:

Remove the hardcodes for paths from C++ and Python. In C++ use CLI parsing (there should be a variant of CLI11 includable). In Python you should be able to use argparse.

Both in place, add an additional shell script which will serve as documentation to run both (Python check scripts, C++ converter) and report differences. The shell-script may fetch the Python (deepquest) model and using it then write out the .zips requested in https://github.com/felipesantosk/bergamot-translator/issues/2#issuecomment-1055777416 making the process ahead easier.

It should make running easy for the reviewers in case more hands-on help is required and provide a possibility for attaching a check via GitHub Actions.

Bear in mind you can parallelize development - if you're stuck waiting on inputs from me or @graemenail you can unit test port of Linear Layer etc by means of random inputs (PyTorch should allow you to pick select tensors or nn.Modules). If the units work, the whole should work.

I think the different SentencePiece might run into potential trouble with marian's equivalent SentencePiece vocab (because some parameters are hardcoded in some way) - but we should be able to fix that eventually.

Mar 04 '22 15:03 jerinphilip

Ended up not being used, sorry about that.

Jul 31 '23 14:07 XapaJIaMnu

bergamot-translator bergamot-translator copied to clipboard

QE - Distilled model

bergamot-translator
bergamot-translator copied to clipboard