transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Canine Model Config to AutoModelForCausalLM

Open itsbvk opened this issue 1 year ago • 4 comments

Feature request

Kindly add a class such as https://github.com/huggingface/transformers/blob/a9bd5df16a46356463f2712dd8f6c109fa83d6f9/src/transformers/models/bert/modeling_bert.py#L1161

for the Canine Model.

Basically, in the list of models available for CausalLM provided here, the canine model isn't listed. Kindly add it.

Motivation

Currently unable to experiment with CanineConfig LM decoder using this api.

Snippet of code used:

from transformers import ViTConfig, VisionEncoderDecoderConfig, VisionEncoderDecoderModel, CanineConfig
# taken from https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/vision-encoder-decoder#transformers.VisionEncoderDecoderConfig.example
config_encoder = ViTConfig()
config_decoder = CanineConfig()
config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
model = VisionEncoderDecoderModel(config=config)

Your contribution

Not yet, currently.

itsbvk avatar Mar 09 '23 22:03 itsbvk

Hi,

CANINE doesn't support causal attention. It can only be used as an encoder.

NielsRogge avatar Mar 10 '23 08:03 NielsRogge

Thanks @NielsRogge for pointing that out. Is there then, any pre-trained language model similar to that of canine that processes the tokens at unicode character level. i.e. the tokenizer basically does

tokens = [ord(c) for c in string]

itsbvk avatar Mar 10 '23 13:03 itsbvk

You can leverage the decoder of ByT5, which is a byte-based model.

NielsRogge avatar Mar 10 '23 14:03 NielsRogge

@NielsRogge I think ByT5 while it does have the tokenization the way I wanted, it still cannot be used by the VisualEncoderDecoder API of hugging face - using the snippet like shown below:

from transformers import ViTConfig, VisionEncoderDecoderConfig, VisionEncoderDecoderModel, ByT5Config # this does not exist
# taken from https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/vision-encoder-decoder#transformers.VisionEncoderDecoderConfig.example
config_encoder = ViTConfig()
config_decoder = ByT5Config() # this is what is desired.
config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
model = **VisionEncoderDecoderModel(config=config)**

Trying something like the following:

from transformers import VisionEncoderDecoderModel
ved = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
    "google/vit-base-patch16-224-in21k", 'google/byt5-small'
)

Throws up the following ValueError

ValueError: Unrecognized configuration class <class 'transformers.models.t5.configuration_t5.T5Config'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, MarianConfig, MBartConfig, MegatronBertConfig, MvpConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig.

Do any of the above models listed have tokenization at Byte level or character level - so that it can be used by the VisualEncoderDecoderModel API provided by 🤗.

itsbvk avatar Mar 10 '23 21:03 itsbvk

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 09 '23 15:04 github-actions[bot]

Hi @khadiravana-belagavi this is because T5/ByT5 is an encoder-decoder model. You would only need the decoder to combine it with a vision encoder. The vision encoder-decoder framework doesn't work out-of-the-box with T5/ByT5 at the moment as this would require us to define a new class that includes only the decoder + a language modeling head on top.

Hence I'd recommend defining this class yourself and then provide it as decoder argument when instantiating a VisionEncoderDecoderModel class. The class could roughly look like this:

from transformers.models.t5.modeling_t5 import T5PreTrainedModel, T5Stack

class T5DecoderOnlyForCausalLM(T5PreTrainedModel):

  def __init__(self, config):
      self.shared = nn.Embedding(config.vocab_size, config.d_model)
      self.decoder = T5Stack(config, self.shared)
      self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)

Then you can instantiate the model as follows:

from transformers import VisionEncoderDecoderModel, ViTModel

encoder = ViTModel.from_pretrained("google/vit-base-patch16-224")
decoder = T5DecoderOnlyForCausalLM.from_pretrained("t5-base")

model = VisionEncoderDecoderModel(encoder=encoder, decoder=decoder)

One would also need to check whether the weights of the decoder are properly instantiated, the draft above probably won't load the weights correctly.

NielsRogge avatar Apr 18 '23 12:04 NielsRogge

Thanks @NielsRogge for the detailed response. However, I think this issue is still relevant, as although Canine is not a CausalLM, Bert is not as well. And the class BertLMHeadModel adds the necessary components for finetuning on CLM task. Or is there anything specific to Canine - because canine is also a pretrained on a similar MLM task.

itsbvk avatar Apr 26 '23 06:04 itsbvk

@khadiravana-belagavi BERT can be adapted to be used as decoder (by simply using a causal attention mask rather than a bidirectional one). CANINE on the other hand cannot simply be adapted to work as decoder since it uses a different architecture composed of 3 Transformers.

NielsRogge avatar Apr 27 '23 13:04 NielsRogge

Hi @NielsRogge, I have been intending to use ByT5 as decoder too and I am getting the same error. Thanks for providing with the method to so. `from transformers.models.t5.modeling_t5 import T5PreTrainedModel, T5Stack

class T5DecoderOnlyForCausalLM(T5PreTrainedModel):

def init(self, config): self.shared = nn.Embedding(config.vocab_size, config.d_model) self.decoder = T5Stack(config, self.shared) self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)`

`from transformers import VisionEncoderDecoderModel, ViTModel

encoder = ViTModel.from_pretrained("google/vit-base-patch16-224") decoder = T5DecoderOnlyForCausalLM.from_pretrained("t5-base")

model = VisionEncoderDecoderModel(encoder=encoder, decoder=decoder)`

Can you provide with the detailed code or any reference so that I can accurately create the complete T5DecoderOnlyForCausalLM class and the weights of the decoder are properly instantiated.

Biyani404198 avatar Sep 23 '23 05:09 Biyani404198