h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[Feature] Allow fine tuning T5 models?

Open Taytay opened this issue 2 years ago • 1 comments

🚀 Feature

I'd like to experiment with fine tuning small T5 based models, but it looks like there are some assumptions made in the code about the type of model and configuration class required. I haven't investigated yet, so I don't know how big of an ask this is.

Motivation

I'm training one-off models for specific tasks rather than instruction or chat-tuned models, and think the T5 family of models would be faster and more appropriate. But I like the boilerplate and config in LLM studio.

Taytay avatar May 08 '23 16:05 Taytay

Thanks @Taytay -

we will need to introduce an option for https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSeq2SeqLM

Also need to check how the rest of the API for it matches the current implementations.

psinger avatar May 08 '23 19:05 psinger

Heyy @psinger I would like to take a crack at this !

kanpuriyanawab avatar Jul 03 '23 10:07 kanpuriyanawab

Hi @psinger , I figured out that we can add models by adding model preset here :

https://github.com/h2oai/h2o-llmstudio/blob/be2f2b155dd79ab6fd1f9fd85c5b410c58b33475/llm_studio/python_configs/text_causal_language_modeling_config.py#L492-L499

and then LLM studio loads the backbone with AutoModel.from_pretrained(<preset-name>). Hence

I added t5-small in this ConfigProblemBase class and ran llm studio. I created an experiment with default dataset and ran experiment following error was thrown

Error 2023-07-04 22:41:46,555 - INFO: Number of observations in validation dataset: 83 Downloading (…)lve/main/config.json: 100%|███████████████████████████████| 1.21k/1.21k [00:00 run(cfg=cfg) File "/home/shivance/Desktop/h2o-llmstudio/train.py", line 594, in run model = cfg.architecture.model_class(cfg) File "/home/shivance/Desktop/h2o-llmstudio/llm_studio/src/models/text_causal_language_modeling_model.py", line 141, in __init__ self.backbone, self.backbone_config = create_nlp_backbone( File "/home/shivance/Desktop/h2o-llmstudio/llm_studio/src/utils/modeling_utils.py", line 526, in create_nlp_backbone backbone = model_class.from_pretrained( File "/home/shivance/anaconda3/envs/h2o/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained raise ValueError( ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModelForCausalLM. Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

My thoughts, the models we are using as of now are AutoModelForCausalLM base class while T5 doesn't implement that, the class it has is T5ForConditionalGeneration .

So how should I tell llmstudio that for t5 model use this class to extract backbone !

kanpuriyanawab avatar Jul 04 '23 17:07 kanpuriyanawab

Thanks @Taytay -

we will need to introduce an option for https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSeq2SeqLM

Also need to check how the rest of the API for it matches the current implementations.

Oh just saw, that's what you mentioned above

kanpuriyanawab avatar Jul 04 '23 17:07 kanpuriyanawab

Thanks @psinger!

Taytay avatar Oct 09 '23 09:10 Taytay