Ian Barber

Results 2 issues of Ian Barber

Very slow tokens/second in FP32, feels worse than it should be, but I'm not entirely sure the best way to debug. $ python3 torchchat.py generate --prompt "hello model" -v llama2...

performance

#### Context What is the purpose of this PR? Is it to - [x ] add a new feature - [ ] fix a bug - [ ] update tests...

CLA Signed