llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

export model to fp16

Open kroggen opened this issue 2 years ago • 4 comments

kroggen avatar Aug 23 '23 22:08 kroggen

Ok. I see you went for a much deeper change.

Did you manage to test it?

rdentato avatar Aug 23 '23 22:08 rdentato

It is not tested. I am trying to implement the load of model (version 0 and maybe 1)

kroggen avatar Aug 23 '23 22:08 kroggen

Question: what is the benefit of fp16?

  • As the Llama 2 models were trained in bf16 I find fp16 conversion sketchy. For newly trained models this is less of a concern
  • The file sizes are ofc ~2X smaller
  • The code is a little bit more bloated

Am I missing some considerations?

karpathy avatar Aug 25 '23 15:08 karpathy

The point is that they can be directly loaded into the GPU. Not needing conversion on-the-flying (and having a smaller file to load) significantly reduce the load time (which, for my Tesla T4 is around 2min. for the llama2_7b models.

Also, I tested llama2.c on an ARM machine using their native support for fp16 and it works like a charm (and ARM CPU are cheaper on AWS).

rdentato avatar Aug 25 '23 15:08 rdentato