CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Support for RNN based decoder units

Open harishankar-gopalan opened this issue 2 years ago • 9 comments

Are there any plans to support inference of heterogeneous encoder-decoder architectures where in we use transformer based encoder and RNN/LSTM based decoders ?

Would like to submit this as a new feature request.

harishankar-gopalan avatar May 04 '22 06:05 harishankar-gopalan

Currently there are no plans to support RNN-based decoders.

What is the framework you are using to train these models?

guillaumekln avatar May 04 '22 13:05 guillaumekln

Ah ok !..We would be using Fairseq for the student model. For faster inference we wanted to check if we can convert to CTranslate2 with the VMAP support.

I would also be happy to contribute but I am not sure where to start.

harishankar-gopalan avatar May 05 '22 06:05 harishankar-gopalan

Why not use a full Transformer model for the student? The model would be directly compatible with CTranslate2.

guillaumekln avatar May 05 '22 07:05 guillaumekln

HI @guillaumekln , so currently we are using Transformer for both encoders and decoders. We want to go with hybrid, tranformer(enc)-rnn(dec) based networks to further reduce the inference latency and increase the throughput.

harishankar-gopalan avatar May 05 '22 11:05 harishankar-gopalan

HI @guillaumekln , so currently we are using Transformer for both encoders and decoders. We want to go with hybrid, tranformer(enc)-rnn(dec) based networks to further reduce the inference latency and increase the throughput.

@harishankar-gopalan Have you found such code about tranformer(enc)-rnn(dec) based networks? Currently transformer decoding speed is a bit slow, I also need such a codec framework

Andrewlesson avatar Jun 23 '22 04:06 Andrewlesson

Hi @Andrewlesson no. We intend to go with custom Fairseq model most probably where we define a custom architecture in Fairseq. If that doesn't work our we would have to go with vanilla PyTorch.

harishankar-gopalan avatar Jun 25 '22 11:06 harishankar-gopalan

Did you also consider training a Transformer model with a reduced number of decoder layers? CTranslate2 can run models with a different number of encoder and decoder layers.

guillaumekln avatar Jun 25 '22 18:06 guillaumekln

Yea we are already using a deep encoder shallow decoder architecture. We want to experiment the performance of such architectures where the encoder and decoder stack have separate architectures of their own.

harishankar-gopalan avatar Jun 27 '22 03:06 harishankar-gopalan

We are training models with Marian (and using the Bergamot fork for quantization) for the Firefox Translations feature. The decoder is a RNN (see https://aclanthology.org/2020.ngt-1.26/). It would be nice to see what performance would look like with CTranslate2 (we are already planning to use it to speed up translations with the teacher models: https://github.com/mozilla/firefox-translations-training/issues/165).

marco-c avatar Dec 03 '23 23:12 marco-c