22quinn

Results 2 issues of 22quinn

Summary: `log1p(x)` is more precise than `log(1+x)` when `x` is close to 0. We utilize cuda `log1pf` implementation for fp32. For other precision types, input is first converted to float,...

CLA Signed
fb-exported

Follows the implementation in [vllm/logits_process.py](https://github.com/vllm-project/vllm/blob/main/vllm/logits_process.py) There were two choices on where to tokenise bad words and the second path was taken: 1. Pass both `bad_words` and `tokenizer` via requests, but...

v1