Nicolas Patry

Results 978 comments of Nicolas Patry

Hi @leiwen83 Indeed beam search is not implemented however we have a different algorithm which seems to work just as good or even better. `best_of` taking the best of `n`...

Beam search is much worse than best_of performance wise. The timing difference you show here a surprisingly different. How did you measure (model, harward, where did you get the timing...

Oh I see bnb-nf4 is just super slow on anything above batch_size=1. It has nothing to do with best_of.

@bloodsucker99 do you mind opening a PR for it ? I'm not sure where the clear should be added.

@Rogerwyf I made the Pr for it: https://github.com/huggingface/text-generation-inference/pull/829 Thanks you @bloodsucker99 . However, if that fixes it, it looks like it might not be an actual leak, just torch allocator...

Have tried latest image for a spin?

@ZeroYuJie What hardware + Cuda version + environement ?

> --quantize bitsandbytes-nf4 This seems to be coming up everytime I see this issue, it seems to be bnb leaking. We happen to not use it in production ourselves which...

Thanks a lot for this PR and fixing the unsoundness (unsafe). This PR seems even slightly better using Atomic instead (which are lock-free).https://github.com/huggingface/tokenizers/pull/1532

It's a very interesting idea, that has been discussed internally before. Thanks for reopening the discussion. The (legal) work needed would be non trivial, so if huggingface could get a...