autotrain-advanced
autotrain-advanced copied to clipboard
Valid Data is set to None when not applying chat template
Even if I don't whant to use a chat template this function is called and sets valid_data
to None
which results in the downstream error below
https://github.com/huggingface/autotrain-advanced/blob/01673f192f56439f083fc8f84a414d62eb2f5d28/src/autotrain/trainers/clm/utils.py#L443-L463
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO | 2024-06-27 12:54:16 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps
ERROR | 2024-06-27 12:54:16 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 28, in train
train_sft(config)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 19, in train
logging_steps = utils.configure_logging_steps(config, train_data, valid_data)
File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 470, in configure_logging_steps
logging_steps = int(0.2 * len(valid_data) / config.batch_size)
TypeError: object of type 'NoneType' has no len()