h2o-llmstudio
h2o-llmstudio copied to clipboard
[Feature] Allow fine tuning T5 models?
🚀 Feature
I'd like to experiment with fine tuning small T5 based models, but it looks like there are some assumptions made in the code about the type of model and configuration class required. I haven't investigated yet, so I don't know how big of an ask this is.
Motivation
I'm training one-off models for specific tasks rather than instruction or chat-tuned models, and think the T5 family of models would be faster and more appropriate. But I like the boilerplate and config in LLM studio.
Thanks @Taytay -
we will need to introduce an option for https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSeq2SeqLM
Also need to check how the rest of the API for it matches the current implementations.
Heyy @psinger I would like to take a crack at this !
Hi @psinger , I figured out that we can add models by adding model preset here :
https://github.com/h2oai/h2o-llmstudio/blob/be2f2b155dd79ab6fd1f9fd85c5b410c58b33475/llm_studio/python_configs/text_causal_language_modeling_config.py#L492-L499
and then LLM studio loads the backbone with AutoModel.from_pretrained(<preset-name>). Hence
I added
t5-smallin thisConfigProblemBaseclass and ran llm studio. I created an experiment with default dataset and ran experiment following error was thrown
Error
2023-07-04 22:41:46,555 - INFO: Number of observations in validation dataset: 83 Downloading (…)lve/main/config.json: 100%|███████████████████████████████| 1.21k/1.21k [00:00 run(cfg=cfg) File "/home/shivance/Desktop/h2o-llmstudio/train.py", line 594, in run model = cfg.architecture.model_class(cfg) File "/home/shivance/Desktop/h2o-llmstudio/llm_studio/src/models/text_causal_language_modeling_model.py", line 141, in __init__ self.backbone, self.backbone_config = create_nlp_backbone( File "/home/shivance/Desktop/h2o-llmstudio/llm_studio/src/utils/modeling_utils.py", line 526, in create_nlp_backbone backbone = model_class.from_pretrained( File "/home/shivance/anaconda3/envs/h2o/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained raise ValueError( ValueError: Unrecognized configuration classMy thoughts, the models we are using as of now are AutoModelForCausalLM base class while T5 doesn't implement that, the class it has is T5ForConditionalGeneration .
So how should I tell llmstudio that for t5 model use this class to extract backbone !
Thanks @Taytay -
we will need to introduce an option for https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSeq2SeqLM
Also need to check how the rest of the API for it matches the current implementations.
Oh just saw, that's what you mentioned above
Thanks @psinger!