convert_checkpoint_to_lsg icon indicating copy to clipboard operation
convert_checkpoint_to_lsg copied to clipboard

ColBert models

Open puppetm4st3r opened this issue 1 year ago • 2 comments

Hi, i'm traying to conver a Colbert model but get this stack/error

{ "name": "AssertionError", "message": "Provided/config architecture is wrong, make sure it is in:

  • LSGBertModel
  • LSGBertForMaskedLM
  • LSGBertForPreTraining
  • LSGBertLMHeadModel
  • LSGBertForMultipleChoice
  • LSGBertForQuestionAnswering
  • LSGBertForSequenceClassification
  • LSGBertForTokenClassification
  • BertModel
  • BertForMaskedLM
  • BertForPreTraining
  • BertLMHeadModel
  • BertForMultipleChoice
  • BertForQuestionAnswering
  • BertForSequenceClassification
  • BertForTokenClassification", "stack": "--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[1], line 5 3 converter = LSGConverter(max_sequence_length=4096) 4 if converter: ----> 5 model, tokenizer = converter.convert_from_pretrained("AdrienB134/ColBERTv1.0-bert-based-spanish-mmarcoES") 6 print(type(model))

File ~/.local/lib/python3.10/site-packages/lsg_converter/converter.py:96, in LSGConverter.convert_from_pretrained(self, model_name_or_path, architecture, use_auth_token, **model_kwargs) 78 if model_type in _AUTH_MODELS.keys(): 79 converter = _AUTH_MODELS[model_type]( 80 initial_model=model_name_or_path, 81 model_name=model_name_or_path, (...) 93 seed=self.seed 94 ) ---> 96 return converter.process()

File ~/.local/lib/python3.10/site-packages/lsg_converter/conversion_utils.py:60, in ConversionScript.process(self) 58 def process(self): ---> 60 (lsg_architecture, lsg_model), initial_architecture = self.get_architecture() 61 is_base_architecture, is_lsg, keep_first_global = self.get_additional_params(lsg_architecture, initial_architecture) 62 model, tokenizer = self.get_model(lsg_architecture, lsg_model)

File ~/.local/lib/python3.10/site-packages/lsg_converter/conversion_utils.py:97, in ConversionScript.get_architecture(self) 95 if architectures is not None: 96 architecture = architectures if isinstance(architectures, str) else architectures[0] ---> 97 return self.validate_architecture(architecture) 99 return self.validate_architecture(self._DEFAULT_ARCHITECTURE_TYPE)

File ~/.local/lib/python3.10/site-packages/lsg_converter/conversion_utils.py:105, in ConversionScript.validate_architecture(self, architecture) 102 _architecture = self._ARCHITECTURE_TYPE_DICT.get(architecture, None) 104 s = "\

  • " + "\
  • ".join([k for k in self._ARCHITECTURE_TYPE_DICT.keys()]) --> 105 assert _architecture is not None, f"Provided/config architecture is wrong, make sure it is in: {s}" 106 return _architecture, architecture

AssertionError: Provided/config architecture is wrong, make sure it is in:

  • LSGBertModel
  • LSGBertForMaskedLM
  • LSGBertForPreTraining
  • LSGBertLMHeadModel
  • LSGBertForMultipleChoice
  • LSGBertForQuestionAnswering
  • LSGBertForSequenceClassification
  • LSGBertForTokenClassification
  • BertModel
  • BertForMaskedLM
  • BertForPreTraining
  • BertLMHeadModel
  • BertForMultipleChoice
  • BertForQuestionAnswering
  • BertForSequenceClassification
  • BertForTokenClassification" }

is not supported arch?

puppetm4st3r avatar Feb 11 '24 03:02 puppetm4st3r

The architecture in the config is HF_ColBERT, see. If this is a BERT model, change:

"architectures": ["HF_ColBERT"]

to

"architectures": ["bert"]

Or try :

model, tokenizer = converter.convert_from_pretrained("AdrienB134/ColBERTv1.0-bert-based-spanish-mmarcoES", architecture="BertForMaskedLM"
)

ccdv-ai avatar Feb 11 '24 12:02 ccdv-ai

will try! thanks!

puppetm4st3r avatar Feb 12 '24 03:02 puppetm4st3r