Chime Ogbuji

Results 10 comments of Chime Ogbuji

# M1 Ultra 128GB # transformer_lm % python main.py Training a transformer with 153.883 M parameters Iter 10: Train loss 8.963, It/sec 0.347 Iter 20: Train loss 8.379, It/sec 0.354...

If just training on raw corpus, I have been using the raw text, per lora.py. For instruction datasets, I have been using the Mistral prompt format surrounding the input before...

The recent changes to allow the *loss* and *iterate_batches* functions to be specified for the tuning process have made doing this a lot more straightforward to do. I have done...

This is a great idea and another thing I wouldn't have to roll my own version of. The only thing I would add is a request for SGDR (see [cyclic-cosine-decay](https://github.com/abhuse/cyclic-cosine-decay))...

Another pass at separating model-specific bits from training logic. Still keeping an eye on #213 to see if there is any synergy

> Just pushed (proposed) final version of #213 . Take a look and let me know how I can help utilize our changes together! That would be fantastic! Sorry I...

> Is that the default `train.jsonl` or a custom one? You should split those lines o/w they will consume a ton of memory. See the section on [reducing memory use](https://github.com/ml-explore/mlx-examples/tree/main/lora#Memory-Issues)....

#235 is dated. I can rebase it to mlx-examples/main and update it (to support the parameters that have been added since I last worked on that PR) if there is...

> I think a more sustainable way to do this is the following: > > * Have a field in the Yaml which gives the layers keys to apply LoRA...

Ok. I have incorporated that also into this PR