CTranslate2
CTranslate2 copied to clipboard
support different pre_norm and layernorm_embedding in TransformerSpec for EncoderDecoderModel huggingface
can we make a change to support different prenorm and layernorm_embedding types for encoder and decoder in TransformerSpec for seq2seq model? Then it can support flexible encoder-decoder model in huggingface/transformers like bert2gpt (bert->pre_norm=False, layernorm_embedding=True) + GPT (pre_norm=True, layernorm_embedding=False).
- different pre_norm in encoder/decoder
- different layernorm_embedding in encoder/decoder
- different num_heads in encoder/decoder (it would also be great to have this)
I saw we already support different layers in TransformerSpec
noticed a small typo here for type hints when trying the conversion code. It should be Tuple[int, int]. https://github.com/OpenNMT/CTranslate2/blob/1bff07374ef9b355ff6dac0a5c510d1ed46a070e/python/ctranslate2/specs/transformer_spec.py#L19. cc @guillaumekln
Thanks for pointing out the incorrect typing. I will fix that.
Regarding the initial request, we should be able to support this kind of architecture. We will check how to expand the Transformer specification.
The PR above is changing the TransformerSpec
constructor to accept arbitrary encoder and decoder specifications. You should then be able to convert the bert2gpt2 model in a custom converter, but we will not support this model out of the box.
What would a custom converter look like? Is there a good starting point for one?