transformerlab-app icon indicating copy to clipboard operation
transformerlab-app copied to clipboard

no longer able to fine tune with version 0.12.0

Open itsPreto opened this issue 8 months ago • 1 comments

Image

macOS 14.7.1 (23H222)

the job ran for a few minutes then crashed.

here's the output:

-- RUN 2025-04-08 13:44:16--
Plugin dir: /Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer
Arguments:
Namespace(input_file='/Users/<>/.transformerlab/workspace/temp/plugin_input_29.json')
Input:
{
    "experiment": {
        "id": 1,
        "name": "alpha",
        "config": {
            "foundation": "mlx-community/Qwen2.5-7B-Instruct-4bit",
            "adaptor": "",
            "foundation_model_architecture": "MLX",
            "foundation_filename": "",
            "generationParams": "{\"temperature\": 0.7, \"maxTokens\": 1024, \"topP\": 1.0, \"frequencyPenalty\": 0.0}",
            "inferenceParams": {
                "inferenceEngine": "mlx_server",
                "inferenceEngineFriendlyName": ""
            },
            "prompt_template": {
                "system_message": "You are a helpful assistant that matches user speech transcriptions to available commands. Identify if the user's transcribed speech matches one of the available commands. Return only the command name or 'no_match' if no command matches.\n\nAvailable commands:\n- grab_handle: grab handle down, grab handle up, lower grab handle, raise grab handle, handle down, handle up, drop handle, lift handle\n- toilet_flush: flush toilet, toilet flush, flush, flush the toilet\n- trash_lid: open trash, close trash, trash open, trash close, trash lid open, trash lid close, open trash bin, close trash bin, open bin, close bin\n- attendant_call: call attendant, attendant call, call flight attendant, call for help, call for service, call for assistance\n\nExample 1:\nUser transcription: \"flush toilet\"\nCommand: toilet_flush\n\nExample 2:\nUser transcription: \"um can you please open the trash\"\nCommand: trash_lid\n\nExample 3:\nUser transcription: \"I need some help with my meal\"\nCommand: attendant_call\n\nExample 4:\nUser transcription: \"help me flushh the toilett\"\nCommand: toilet_flush\n\nNow identify the command in the following user transcription:"
            },
            "embedding_model": "BAAI/bge-base-en-v1.5",
            "embedding_model_filename": "",
            "embedding_model_architecture": "BertModel"
        },
        "created_at": "2025-02-08 04:06:37",
        "updated_at": "2025-02-08 04:06:37"
    },
    "config": {
        "template_name": "StoicGroundedReasoner",
        "plugin_name": "mlx_lora_trainer",
        "model_name": "mlx-community/Qwen2.5-7B-Instruct-4bit",
        "model_architecture": "MLX",
        "foundation_model_file_path": "",
        "embedding_model": "BAAI/bge-base-en-v1.5",
        "embedding_model_architecture": "BertModel",
        "embedding_model_file_path": "",
        "formatting_template": "User: {{input}}\nAssistant thinking: {{reasoning}}\nAssistant: {{response}}",
        "dataset_name": "stoic_grounded_reasoning",
        "lora_layers": "16",
        "batch_size": "4",
        "learning_rate": "0.00005",
        "lora_rank": "8",
        "lora_alpha": "16",
        "iters": "1000",
        "steps_per_report": "100",
        "steps_per_eval": "200",
        "save_every": "100",
        "adaptor_name": "adaptor",
        "fuse_model": "on",
        "type": "LoRA",
        "job_id": 29,
        "adaptor_output_dir": "/Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor",
        "output_dir": "/Users/<>/.transformerlab/workspace/experiments/alpha/tensorboards/StoicGroundedReasoner"
    }
}
LoRA config:
{'lora_parameters': {'alpha': '16', 'rank': '8', 'scale': 2.0, 'dropout': 0}}
No validation slice found in dataset /Users/<>/.transformerlab/workspace/datasets/stoic_grounded_reasoning:
Using a default 80/10/10 split for training, test and valid.
Loaded train dataset with 118 examples.
Loaded valid dataset with 15 examples.
Example formatted training example:
User: Taking vitamin C will prevent you from catching a cold.
Assistant thinking: W evaluation: This belief has moderate W due to mixed evidence. While vitamin C is important for immune function (high-W belief), the claim that it prevents colds contradicts clinical trial results showing limited preventative effects (though it may reduce duration/severity slightly). C assessment: Moderate C as nutritional effects on health are well-studied, but this specific claim conflicts with multiple controlled studies. The belief operates in a domain (medicine) where empirical evidence carries high weight. DP calculation: Moderate DP generated as the claim simplifies a more complex reality and overstates vitamin C's effects, contradicting clinical evidence. Resolution process: Qualify the claim by acknowledging vitamin C's role in immune health while correcting the overstated preventative effect.
Assistant: While vitamin C is important for immune function, research doesn't support that it prevents colds. Numerous clinical trials have found that regular vitamin C supplementation doesn't significantly reduce the likelihood of catching a cold for most people, though it may slightly reduce the duration and severity of symptoms. Maintaining adequate vitamin C is good for overall health, but it's not a guaranteed cold prevention method.
Running command:
['/Users/<>/.transformerlab/envs/transformerlab/bin/python3', '-um', 'mlx_lm.lora', '--model', 'mlx-community/Qwen2.5-7B-Instruct-4bit', '--iters', '1000', '--train', '--adapter-path', '/Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor', '--num-layers', '16', '--batch-size', '4', '--learning-rate', '0.00005', '--data', '/Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/data', '--steps-per-report', '100', '--steps-per-eval', '200', '--save-every', '100', '--config', '/Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/config.yaml']
Training beginning:
Adaptor will be saved in: /Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor
Writing logs to: /Users/<>/.transformerlab/workspace/experiments/alpha/tensorboards/StoicGroundedReasoner/20250408-134422
Loading configuration file /Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/config.yaml
Loading pretrained model

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 26810.18it/s]
Loading datasets
Training
Trainable parameters: 0.033% (2.523M/7615.617M)
Starting training..., iters: 1000
Progress:  0.10%
Validation Loss:  2.314
Iter 1: Val loss 2.314, Val took 84.114s
/Users/<>/.transformerlab/envs/transformerlab/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Image Image Image Image

itsPreto avatar Apr 08 '25 18:04 itsPreto

Hi @itsPreto, We fixed some things in the newer versions. Are you still facing the same issues? Based on the logs it seems like the training just quits in the model, I just wanted to make sure that the memory doesn't run out?

deep1401 avatar Apr 15 '25 18:04 deep1401

Closing this since its a stale issue now

deep1401 avatar May 02 '25 21:05 deep1401