CTranslate2 support different pre_norm and layernorm_embedding in TransformerSpec for EncoderDecoderModel huggingface

support different pre_norm and layernorm_embedding in TransformerSpec for EncoderDecoderModel huggingface

Open nlpcat opened this issue 3 years ago • 2 comments

can we make a change to support different prenorm and layernorm_embedding types for encoder and decoder in TransformerSpec for seq2seq model? Then it can support flexible encoder-decoder model in huggingface/transformers like bert2gpt (bert->pre_norm=False, layernorm_embedding=True) + GPT (pre_norm=True, layernorm_embedding=False).

different pre_norm in encoder/decoder
different layernorm_embedding in encoder/decoder
different num_heads in encoder/decoder (it would also be great to have this)

I saw we already support different layers in TransformerSpec

Jun 15 '22 01:06 nlpcat

noticed a small typo here for type hints when trying the conversion code. It should be Tuple[int, int]. https://github.com/OpenNMT/CTranslate2/blob/1bff07374ef9b355ff6dac0a5c510d1ed46a070e/python/ctranslate2/specs/transformer_spec.py#L19. cc @guillaumekln

Jun 15 '22 22:06 nlpcat

Thanks for pointing out the incorrect typing. I will fix that.

Regarding the initial request, we should be able to support this kind of architecture. We will check how to expand the Transformer specification.

Jun 16 '22 07:06 guillaumekln

The PR above is changing the TransformerSpec constructor to accept arbitrary encoder and decoder specifications. You should then be able to convert the bert2gpt2 model in a custom converter, but we will not support this model out of the box.

Dec 06 '22 14:12 guillaumekln

What would a custom converter look like? Is there a good starting point for one?

Jan 23 '24 15:01 Bachstelze

CTranslate2 CTranslate2 copied to clipboard

support different pre_norm and layernorm_embedding in TransformerSpec for EncoderDecoderModel huggingface

CTranslate2
CTranslate2 copied to clipboard