Ian Barber
Results
2
issues of
Ian Barber
Very slow tokens/second in FP32, feels worse than it should be, but I'm not entirely sure the best way to debug. $ python3 torchchat.py generate --prompt "hello model" -v llama2...
performance
#### Context What is the purpose of this PR? Is it to - [x ] add a new feature - [ ] fix a bug - [ ] update tests...
CLA Signed