New export code OOM with 7B model
The new export code instantiates a Transformer() model so it needs double the memory (float32 weights vs bfloat16 in the Meta checkpoints).
The old llama only exporter code can easily export the 7B model with 16G RAM (in under 1 minute, not 10 as in the README), the new one gets killed because it's out of memory.
I see, thanks for raising. Thinking...
Just noting that this runs out of memory on my 32GB system, so it seems to be more than doubling the memory use.
Probably the legacy script export works, I'm guessing?
https://github.com/karpathy/llama2.c/blob/de005474d37d0cde1356739b8c79ebe7b42b5973/export_meta_llama_bin.py
As a temporary patch... sigh
Indeed, I can confirm that runs in a few minutes (sorry did not time, but ~2 mins) for me. [Python3.8 Ubuntu 20.04 32GB system]
FWIW, connected to above, new export.py script OOMs with llama-2-70B model on 197G machine. llama-2-13B seems to export fine with the same machine.
Running llama-2-13B models exported with --version 2 and --version 1 core dumps:
./run llama-2-13b.bin -n 25 -i "Once upon a time"
Segmentation fault (core dumped)
./run llama-2-13b_quantized.bin -n 25 -i "Once upon a time"
Segmentation fault (core dumped)
Model exported with --version 0 works fine:
./run llama-2-13b_0.bin -n 25 -i "Once upon
a time"
Once upon a time, the fishing community was known for its strength, courage and determination
llama-2-70B model exports fine with old script and runs fine too:
OMP_NUM_THREADS=4 ./run llama-2-70b.bin -n 25 -i "Once upon a time"
Once upon a time, there was a young guy who spent most of his life waiting for things to happen. He hoped
achieved tok/s: 0.010036