Chime Ogbuji comments

Results 10 comments of


Chime Ogbuji

M1 perf. tests. Who has M2/M3 could you run tests and publish results?

# M1 Ultra 128GB # transformer_lm % python main.py Training a transformer with 153.883 M parameters Iter 10: Train loss 8.963, It/sec 0.347 Iter 20: Train loss 8.379, It/sec 0.354...

What's a good data format for lora fine-tuning?

If just training on raw corpus, I have been using the raw text, per lora.py. For instruction datasets, I have been using the Mistral prompt format surrounding the input before...

Instruct tuning for lora/finetune?

The recent changes to allow the *loss* and *iterate_batches* functions to be specified for the tuning process have made doing this a lot more straightforward to do. I have done...

Support LR schedulers

This is a great idea and another thing I wouldn't have to roll my own version of. The only thing I would add is a request for SGDR (see [cyclic-cosine-decay](https://github.com/abhuse/cyclic-cosine-decay))...

Extension of lora.py for supervised ML of configurable dataset formats with YAML-based configuration of parameters

Another pass at separating model-specific bits from training logic. Still keeping an eye on #213 to see if there is any synergy

Extension of lora.py for supervised ML of configurable dataset formats with YAML-based configuration of parameters

> Just pushed (proposed) final version of #213 . Take a look and let me know how I can help utilize our changes together! That would be fantastic! Sorry I...

Need more documentation on using a custom dataset for fine-tuning with LoRA

> Is that the default `train.jsonl` or a custom one? You should split those lines o/w they will consume a ton of memory. See the section on [reducing memory use](https://github.com/ml-explore/mlx-examples/tree/main/lora#Memory-Issues)....

Additional parameters to mlx_lm lora? r, lora_alpha, lora_dropout, scale?

#235 is dated. I can rebase it to mlx-examples/main and update it (to support the parameters that have been added since I last worked on that PR) if there is...

LoRA on all linear transformer block layers

> I think a more sustainable way to do this is the following: > > * Have a field in the Yaml which gives the layers keys to apply LoRA...

LoRA on all linear transformer block layers

Ok. I have incorporated that also into this PR