James Whedbee
Results
1
issues of
James Whedbee
### Some context I am using AMD MI100 GPUs and I can get ~33 tokens/second for Llama 2 70B using - compile - tensor parallelism of 8 - int8 quantization...