llama2.c New export code OOM with 7B model

The new export code instantiates a Transformer() model so it needs double the memory (float32 weights vs bfloat16 in the Meta checkpoints).

The old llama only exporter code can easily export the 7B model with 16G RAM (in under 1 minute, not 10 as in the README), the new one gets killed because it's out of memory.

Aug 22 '23 21:08 janimo

I see, thanks for raising. Thinking...

Aug 23 '23 00:08 karpathy

Just noting that this runs out of memory on my 32GB system, so it seems to be more than doubling the memory use.

Aug 26 '23 03:08 rogerallen

Probably the legacy script export works, I'm guessing?

https://github.com/karpathy/llama2.c/blob/de005474d37d0cde1356739b8c79ebe7b42b5973/export_meta_llama_bin.py

As a temporary patch... sigh

Aug 26 '23 21:08 karpathy

Indeed, I can confirm that runs in a few minutes (sorry did not time, but ~2 mins) for me. [Python3.8 Ubuntu 20.04 32GB system]

Aug 26 '23 21:08 rogerallen

FWIW, connected to above, new export.py script OOMs with llama-2-70B model on 197G machine. llama-2-13B seems to export fine with the same machine.

Running llama-2-13B models exported with --version 2 and --version 1 core dumps:

./run llama-2-13b.bin -n 25 -i "Once upon a time"
Segmentation fault (core dumped)

./run llama-2-13b_quantized.bin -n 25 -i "Once upon a time"
Segmentation fault (core dumped)

Model exported with --version 0 works fine:

./run llama-2-13b_0.bin -n 25 -i "Once upon
 a time"
Once upon a time, the fishing community was known for its strength, courage and determination

llama-2-70B model exports fine with old script and runs fine too:

OMP_NUM_THREADS=4 ./run llama-2-70b.bin -n 25 -i "Once upon a time"
Once upon a time, there was a young guy who spent most of his life waiting for things to happen. He hoped
achieved tok/s: 0.010036

Sep 04 '23 22:09 raimondasl