CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

support different pre_norm and layernorm_embedding in TransformerSpec for EncoderDecoderModel huggingface

Open nlpcat opened this issue 2 years ago • 2 comments

can we make a change to support different prenorm and layernorm_embedding types for encoder and decoder in TransformerSpec for seq2seq model? Then it can support flexible encoder-decoder model in huggingface/transformers like bert2gpt (bert->pre_norm=False, layernorm_embedding=True) + GPT (pre_norm=True, layernorm_embedding=False).

  • different pre_norm in encoder/decoder
  • different layernorm_embedding in encoder/decoder
  • different num_heads in encoder/decoder (it would also be great to have this)

I saw we already support different layers in TransformerSpec

nlpcat avatar Jun 15 '22 01:06 nlpcat

noticed a small typo here for type hints when trying the conversion code. It should be Tuple[int, int]. https://github.com/OpenNMT/CTranslate2/blob/1bff07374ef9b355ff6dac0a5c510d1ed46a070e/python/ctranslate2/specs/transformer_spec.py#L19. cc @guillaumekln

nlpcat avatar Jun 15 '22 22:06 nlpcat

Thanks for pointing out the incorrect typing. I will fix that.

Regarding the initial request, we should be able to support this kind of architecture. We will check how to expand the Transformer specification.

guillaumekln avatar Jun 16 '22 07:06 guillaumekln

The PR above is changing the TransformerSpec constructor to accept arbitrary encoder and decoder specifications. You should then be able to convert the bert2gpt2 model in a custom converter, but we will not support this model out of the box.

guillaumekln avatar Dec 06 '22 14:12 guillaumekln

What would a custom converter look like? Is there a good starting point for one?

Bachstelze avatar Jan 23 '24 15:01 Bachstelze