mlx
mlx copied to clipboard
[BUG] Internal error when fine-tuning Gemma
E.g.:
mlx_lm.lora --model mlx-community/codegemma-7b-it-8bit --train --adapter-path adapters_codegemma_7B --data training_data --iters 500
Can result in:
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Internal Error (0000000e:Internal Error)
zsh: abort mlx_lm.lora --model mlx-community/codegemma-7b-it-8bit --train --adapter-path
The splikt matmul on the output which has a very large inner dimension (256k) appears to be the culprit. @jagrit06 is looking into this.
Hi, I'm experiencing the same issue. Is there a workaround?