autotrain-advanced icon indicating copy to clipboard operation
autotrain-advanced copied to clipboard

Valid Data is set to None when not applying chat template

Open luisblanche-mirakl opened this issue 7 months ago • 5 comments

Even if I don't whant to use a chat template this function is called and sets valid_data to None which results in the downstream error below https://github.com/huggingface/autotrain-advanced/blob/01673f192f56439f083fc8f84a414d62eb2f5d28/src/autotrain/trainers/clm/utils.py#L443-L463

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO     | 2024-06-27 12:54:16 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps
ERROR    | 2024-06-27 12:54:16 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 28, in train
    train_sft(config)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 19, in train
    logging_steps = utils.configure_logging_steps(config, train_data, valid_data)
  File "/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 470, in configure_logging_steps
    logging_steps = int(0.2 * len(valid_data) / config.batch_size)
TypeError: object of type 'NoneType' has no len()

luisblanche-mirakl avatar Jun 27 '24 13:06 luisblanche-mirakl