autotrain-advanced
autotrain-advanced copied to clipboard
[BUG] Valid split will trigger AttributeError: 'NoneType' object has no attribute 'map'
Prerequisites
- [X] I have read the documentation.
- [X] I have checked other issues for similar problems.
Backend
Local
Interface Used
CLI
CLI Command
'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'project_name': 'Meta-Ll-UMLS-Co-2-0-0001', 'data_path': 'XXXX/UMLS_Concept_train', 'train_split': 'train', 'valid_split': 'valid', 'add_eos_token': False, 'block_size': 4096, 'model_max_length': 8192, 'padding': None, 'trainer': 'default', 'use_flash_attention_2': False, 'log': 'none', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'save_total_limit': 2, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 0.0001, 'epochs': 2, 'batch_size': 4, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.01, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'tokenizer', 'quantization': None, 'target_modules': 'all-linear', 'merge_adapter': True, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'conversations', 'rejected_text_column': 'rejected', 'push_to_hub': True, 'username': 'XXXX', 'token': '*****'}
UI Screenshots & Parameters
No response
Error Logs
ERROR | 2024-05-08 22:21:35 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, **kwargs) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 23, in train train_default(config) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_default.py", line 40, in train train_data, valid_data = utils.process_data_with_chat_template(config, tokenizer, train_data, valid_data) File "/blue/yonghui.wu/qx68/autotrain-conda/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 414, in process_data_with_chat_template valid_data = valid_data.map( AttributeError: 'NoneType' object has no attribute 'map'
Additional Information
why there is:
valid_data = None
on clm utils?
It looks like that it will always trigger the value error when valid_split is specified.
we still need to allow valid_split for llm tasks. currently validation is disabled for llm finetuning as its not representative.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 20 days since being marked as stale.