transformerlab-app
transformerlab-app copied to clipboard
no longer able to fine tune with version 0.12.0
macOS 14.7.1 (23H222)
the job ran for a few minutes then crashed.
here's the output:
-- RUN 2025-04-08 13:44:16--
Plugin dir: /Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer
Arguments:
Namespace(input_file='/Users/<>/.transformerlab/workspace/temp/plugin_input_29.json')
Input:
{
"experiment": {
"id": 1,
"name": "alpha",
"config": {
"foundation": "mlx-community/Qwen2.5-7B-Instruct-4bit",
"adaptor": "",
"foundation_model_architecture": "MLX",
"foundation_filename": "",
"generationParams": "{\"temperature\": 0.7, \"maxTokens\": 1024, \"topP\": 1.0, \"frequencyPenalty\": 0.0}",
"inferenceParams": {
"inferenceEngine": "mlx_server",
"inferenceEngineFriendlyName": ""
},
"prompt_template": {
"system_message": "You are a helpful assistant that matches user speech transcriptions to available commands. Identify if the user's transcribed speech matches one of the available commands. Return only the command name or 'no_match' if no command matches.\n\nAvailable commands:\n- grab_handle: grab handle down, grab handle up, lower grab handle, raise grab handle, handle down, handle up, drop handle, lift handle\n- toilet_flush: flush toilet, toilet flush, flush, flush the toilet\n- trash_lid: open trash, close trash, trash open, trash close, trash lid open, trash lid close, open trash bin, close trash bin, open bin, close bin\n- attendant_call: call attendant, attendant call, call flight attendant, call for help, call for service, call for assistance\n\nExample 1:\nUser transcription: \"flush toilet\"\nCommand: toilet_flush\n\nExample 2:\nUser transcription: \"um can you please open the trash\"\nCommand: trash_lid\n\nExample 3:\nUser transcription: \"I need some help with my meal\"\nCommand: attendant_call\n\nExample 4:\nUser transcription: \"help me flushh the toilett\"\nCommand: toilet_flush\n\nNow identify the command in the following user transcription:"
},
"embedding_model": "BAAI/bge-base-en-v1.5",
"embedding_model_filename": "",
"embedding_model_architecture": "BertModel"
},
"created_at": "2025-02-08 04:06:37",
"updated_at": "2025-02-08 04:06:37"
},
"config": {
"template_name": "StoicGroundedReasoner",
"plugin_name": "mlx_lora_trainer",
"model_name": "mlx-community/Qwen2.5-7B-Instruct-4bit",
"model_architecture": "MLX",
"foundation_model_file_path": "",
"embedding_model": "BAAI/bge-base-en-v1.5",
"embedding_model_architecture": "BertModel",
"embedding_model_file_path": "",
"formatting_template": "User: {{input}}\nAssistant thinking: {{reasoning}}\nAssistant: {{response}}",
"dataset_name": "stoic_grounded_reasoning",
"lora_layers": "16",
"batch_size": "4",
"learning_rate": "0.00005",
"lora_rank": "8",
"lora_alpha": "16",
"iters": "1000",
"steps_per_report": "100",
"steps_per_eval": "200",
"save_every": "100",
"adaptor_name": "adaptor",
"fuse_model": "on",
"type": "LoRA",
"job_id": 29,
"adaptor_output_dir": "/Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor",
"output_dir": "/Users/<>/.transformerlab/workspace/experiments/alpha/tensorboards/StoicGroundedReasoner"
}
}
LoRA config:
{'lora_parameters': {'alpha': '16', 'rank': '8', 'scale': 2.0, 'dropout': 0}}
No validation slice found in dataset /Users/<>/.transformerlab/workspace/datasets/stoic_grounded_reasoning:
Using a default 80/10/10 split for training, test and valid.
Loaded train dataset with 118 examples.
Loaded valid dataset with 15 examples.
Example formatted training example:
User: Taking vitamin C will prevent you from catching a cold.
Assistant thinking: W evaluation: This belief has moderate W due to mixed evidence. While vitamin C is important for immune function (high-W belief), the claim that it prevents colds contradicts clinical trial results showing limited preventative effects (though it may reduce duration/severity slightly). C assessment: Moderate C as nutritional effects on health are well-studied, but this specific claim conflicts with multiple controlled studies. The belief operates in a domain (medicine) where empirical evidence carries high weight. DP calculation: Moderate DP generated as the claim simplifies a more complex reality and overstates vitamin C's effects, contradicting clinical evidence. Resolution process: Qualify the claim by acknowledging vitamin C's role in immune health while correcting the overstated preventative effect.
Assistant: While vitamin C is important for immune function, research doesn't support that it prevents colds. Numerous clinical trials have found that regular vitamin C supplementation doesn't significantly reduce the likelihood of catching a cold for most people, though it may slightly reduce the duration and severity of symptoms. Maintaining adequate vitamin C is good for overall health, but it's not a guaranteed cold prevention method.
Running command:
['/Users/<>/.transformerlab/envs/transformerlab/bin/python3', '-um', 'mlx_lm.lora', '--model', 'mlx-community/Qwen2.5-7B-Instruct-4bit', '--iters', '1000', '--train', '--adapter-path', '/Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor', '--num-layers', '16', '--batch-size', '4', '--learning-rate', '0.00005', '--data', '/Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/data', '--steps-per-report', '100', '--steps-per-eval', '200', '--save-every', '100', '--config', '/Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/config.yaml']
Training beginning:
Adaptor will be saved in: /Users/<>/.transformerlab/workspace/adaptors/mlx-community_Qwen2.5-7B-Instruct-4bit/adaptor
Writing logs to: /Users/<>/.transformerlab/workspace/experiments/alpha/tensorboards/StoicGroundedReasoner/20250408-134422
Loading configuration file /Users/<>/.transformerlab/workspace/plugins/mlx_lora_trainer/config.yaml
Loading pretrained model
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 26810.18it/s]
Loading datasets
Training
Trainable parameters: 0.033% (2.523M/7615.617M)
Starting training..., iters: 1000
Progress: 0.10%
Validation Loss: 2.314
Iter 1: Val loss 2.314, Val took 84.114s
/Users/<>/.transformerlab/envs/transformerlab/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Hi @itsPreto, We fixed some things in the newer versions. Are you still facing the same issues? Based on the logs it seems like the training just quits in the model, I just wanted to make sure that the memory doesn't run out?
Closing this since its a stale issue now