autotrain-advanced icon indicating copy to clipboard operation
autotrain-advanced copied to clipboard

[BUG] Valid split will trigger AttributeError: 'NoneType' object has no attribute 'map'

Open jiminHuang opened this issue 2 months ago • 1 comments

Prerequisites

  • [X] I have read the documentation.
  • [X] I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'project_name': 'Meta-Ll-UMLS-Co-2-0-0001', 'data_path': 'XXXX/UMLS_Concept_train', 'train_split': 'train', 'valid_split': 'valid', 'add_eos_token': False, 'block_size': 4096, 'model_max_length': 8192, 'padding': None, 'trainer': 'default', 'use_flash_attention_2': False, 'log': 'none', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'save_total_limit': 2, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 0.0001, 'epochs': 2, 'batch_size': 4, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.01, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': None, 'target_modules': 'all-linear', 'merge_adapter': True, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'conversations', 'rejected_text_column': 'rejected', 'push_to_hub': True, 'username': 'XXXX', 'token': '*****'}

UI Screenshots & Parameters

No response

Error Logs

ERROR | 2024-05-08 22:21:35 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, **kwargs) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 23, in train train_default(config) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_default.py", line 40, in train train_data, valid_data = utils.process_data_with_chat_template(config, tokenizer, train_data, valid_data) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 414, in process_data_with_chat_template valid_data = valid_data.map( AttributeError: 'NoneType' object has no attribute 'map'

Additional Information

why there is:

valid_data = None

on clm utils?

It looks like that it will always trigger the value error when valid_split is specified.

jiminHuang avatar May 09 '24 02:05 jiminHuang