see here
Hey @nsosio, so just clarrifying here, for PyTorch (#21), is is simply, using the HF-pytorch .bin file for llama-2 7B on fp-16/32 precision.
.bin
fp-16/32
Where as for gpt-fast, it is this latest implementation by PyTorch Labs, right?