eland icon indicating copy to clipboard operation
eland copied to clipboard

Add support for RoBERTa models

Open kylemclaren opened this issue 3 years ago • 1 comments
trafficstars

Attempting to import bertweet-base-sentiment-analysis model to Elastic Cloud.

Eland version 8.0.0

Commad run:

eland_import_hub_model --url <ELASTIC_ENDPOINT> \
--hub-model-id finiteautomata/bertweet-base-sentiment-analysis \
--task-type text_classification

Full output:

/usr/local/bin/eland_import_hub_model:86: DeprecationWarning: The 'timeout' parameter is deprecated in favor of 'request_timeout'
  es = elasticsearch.Elasticsearch(args.url, timeout=300)  # 5 minute timeout
Loading HuggingFace transformer tokenizer and model finiteautomata/bertweet-base-sentiment-analysis
Traceback (most recent call last):
  File "/usr/local/bin/eland_import_hub_model", line 115, in <module>
    main()
  File "/usr/local/bin/eland_import_hub_model", line 91, in main
    tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
  File "/usr/local/lib/python3.9/site-packages/eland/ml/pytorch/transformers.py", line 383, in __init__
    raise TypeError(
TypeError: Tokenizer type PreTrainedTokenizer(name_or_path='finiteautomata/bertweet-base-sentiment-analysis', vocab_size=64000, model_max_len=128, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>'}) not supported, must be one of: <class 'transformers.models.bert.tokenization_bert.BertTokenizer'>, <class 'transformers.models.distilbert.tokenization_distilbert.DistilBertTokenizer'>, <class 'transformers.models.dpr.tokenization_dpr.DPRContextEncoderTokenizer'>, <class 'transformers.models.dpr.tokenization_dpr.DPRQuestionEncoderTokenizer'>, <class 'transformers.models.electra.tokenization_electra.ElectraTokenizer'>, <class 'transformers.models.mobilebert.tokenization_mobilebert.MobileBertTokenizer'>, <class 'transformers.models.mpnet.tokenization_mpnet.MPNetTokenizer'>, <class 'transformers.models.retribert.tokenization_retribert.RetriBertTokenizer'>, <class 'transformers.models.squeezebert.tokenization_squeezebert.SqueezeBertTokenizer'>

kylemclaren avatar Feb 13 '22 20:02 kylemclaren

@kylemclaren RoBERTa models are not yet supported. Only Bert or DistilBert in version 8.0.0. In version 8.1.0, we will add support for MPNet.

RoBERTa is on our todo list.

benwtrent avatar Feb 14 '22 12:02 benwtrent

Support for RoBERTa models was added to Elasticsearch in 8.2 https://github.com/elastic/elasticsearch/pull/84777

davidkyle avatar Aug 24 '23 10:08 davidkyle