llama2.c issues

MPS support

2

Thanks for the amazing, clean library. I had a few issues running it on MPS, I thought I'd share how I got it to work. May not be suitable for...

louismullie

Evolution of tinystories. Open sourced.

3

[Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) **We follow the “Textbooks Are All You Need” approach, focusing this time on common sense reasoning in natural language**, and create...

xefoci7612

How does this part of the Train code work?

1

In the training script, after forwarding the data to the model, there's [a line of code](https://github.com/karpathy/llama2.c/blob/766a30bc6e9a1c69ce007bb69caabf4c6062f0e9/train.py#L308) that says: ``` # immediately async prefetch next batch while model is doing the...

Neltherion

Mojo version?

2

https://github.com/tairov/llama2.mojo Imagine 250x speed on the original...

MackNcD

Trained and LoRA fine-tuned the models to follow instructions for writing tiny stories

8

I trained and LoRA fine-tuned (inspired from wlamond's PR) the models to follow instructions and write tiny stories accordingly, with the prompt data available. [Repo](https://github.com/cindysridykhan/instruct_storyteller_tinyllama2) [blogpost](https://medium.com/@cindy.sridykhan/how-to-train-and-fine-tune-an-instruct-llama2-model-in-pytorch-6cbe11de2b34) Demo: ![story1080](https://github.com/karpathy/llama2.c/assets/142806803/c99ce0be-7b42-446b-b8a2-639132053785) Any feedback...

cindysridykhan

Interpretability of models

4

I started looking into whether the small models could be a target for interpretability. I'm just putting it out here in lack of a better space to find people wanting...

jrudolph

janimo

Optimized code for matmul() works 3.5 faster (for Mac M1 Max with ARM NEON) ... and even more...

4

I rewrote the code of the most critical function `matmul()` and it works faster in about 3.5-4 times (on Mac with ARM M1 Max) then the original. May be it...

agershun

llama2.c
llama2.c copied to clipboard

Metadata

MPS support

Evolution of tinystories. Open sourced.

How does this part of the Train code work?

Mojo version?

Trained and LoRA fine-tuned the models to follow instructions for writing tiny stories

Interpretability of models

[Feature Request] Support InternLM Deploy

Is it possible to adapt this code from DDP to FSDP? If yes, what are the potential issues to look out for?

New export code OOM with 7B model

Optimized code for matmul() works 3.5 faster (for Mac M1 Max with ARM NEON) ... and even more...

← Metadata

Owner

Metadata

llama2.c llama2.c copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama2.c
llama2.c copied to clipboard