eland
eland copied to clipboard
Add support for RoBERTa models
trafficstars
Attempting to import bertweet-base-sentiment-analysis model to Elastic Cloud.
Eland version 8.0.0
Commad run:
eland_import_hub_model --url <ELASTIC_ENDPOINT> \
--hub-model-id finiteautomata/bertweet-base-sentiment-analysis \
--task-type text_classification
Full output:
/usr/local/bin/eland_import_hub_model:86: DeprecationWarning: The 'timeout' parameter is deprecated in favor of 'request_timeout'
es = elasticsearch.Elasticsearch(args.url, timeout=300) # 5 minute timeout
Loading HuggingFace transformer tokenizer and model finiteautomata/bertweet-base-sentiment-analysis
Traceback (most recent call last):
File "/usr/local/bin/eland_import_hub_model", line 115, in <module>
main()
File "/usr/local/bin/eland_import_hub_model", line 91, in main
tm = TransformerModel(args.hub_model_id, args.task_type, args.quantize)
File "/usr/local/lib/python3.9/site-packages/eland/ml/pytorch/transformers.py", line 383, in __init__
raise TypeError(
TypeError: Tokenizer type PreTrainedTokenizer(name_or_path='finiteautomata/bertweet-base-sentiment-analysis', vocab_size=64000, model_max_len=128, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>'}) not supported, must be one of: <class 'transformers.models.bert.tokenization_bert.BertTokenizer'>, <class 'transformers.models.distilbert.tokenization_distilbert.DistilBertTokenizer'>, <class 'transformers.models.dpr.tokenization_dpr.DPRContextEncoderTokenizer'>, <class 'transformers.models.dpr.tokenization_dpr.DPRQuestionEncoderTokenizer'>, <class 'transformers.models.electra.tokenization_electra.ElectraTokenizer'>, <class 'transformers.models.mobilebert.tokenization_mobilebert.MobileBertTokenizer'>, <class 'transformers.models.mpnet.tokenization_mpnet.MPNetTokenizer'>, <class 'transformers.models.retribert.tokenization_retribert.RetriBertTokenizer'>, <class 'transformers.models.squeezebert.tokenization_squeezebert.SqueezeBertTokenizer'>
@kylemclaren RoBERTa models are not yet supported. Only Bert or DistilBert in version 8.0.0. In version 8.1.0, we will add support for MPNet.
RoBERTa is on our todo list.
Support for RoBERTa models was added to Elasticsearch in 8.2 https://github.com/elastic/elasticsearch/pull/84777