autotrain-advanced icon indicating copy to clipboard operation
autotrain-advanced copied to clipboard

[BUG] (Duplicate Flag Generation)_ __main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm

Open unclemusclez opened this issue 1 year ago • 2 comments

Prerequisites

  • [X] I have read the documentation.
  • [X] I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

autotrain app --host 0.0.0.0 --port 7000

UI Screenshots & Parameters

No response

Error Logs

__main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm

INFO     | 2024-10-19 23:01:18 | autotrain.commands:launch_command:524 - {'model': 'unsloth/Qwen2.5-Coder-7B-Instruct', 'project_name': 'autotrain-126tb-pvpyu4', 'data_path': 'skratos115/opendevin_DataDevinator', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 2048, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 1e-06, 'epochs': 1, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'unclemusclez', 'token': '*****', 'unsloth': True, 'distributed_backend': 'none'}
INFO     | 2024-10-19 23:01:18 | autotrain.backends.local:create:25 - Training PID: 57326
INFO:     192.168.2.69:65250 - "POST /ui/create_project HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/accelerators HTTP/1.1" 200 OK
usage: __main__.py [-h] --training_config TRAINING_CONFIG
__main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm
Traceback (most recent call last):
  File "/usr/local/open-webui/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1174, in launch_command
    simple_launcher(args)
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/local/open-webui/.venv/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'bf16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'bf16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu4/training_params.json', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu4/training_params.json']' returned non-zero exit status 2.
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO     | 2024-10-19 23:01:34 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 57326
INFO     | 2024-10-19 23:01:34 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 57326

Additional Information

Running Local and it seems to double up the flags, and then keep doing so every time the training is run.

unclemusclez avatar Oct 19 '24 23:10 unclemusclez