Matthias Reso
Matthias Reso
Hi @felipemello1 Thanks for looking into this again. So the bf16 vs bfloat16 is a good catch, though this is only true for dtype in the main part of the...
@felipemello1 Turns out the precision in the quantizer was a red herring.... it actually needs to be float32 to capture the result of the matmul (see the inductor error mentioned...
@felipemello1 Here is the PR: #1371
@ankithagunapal do you recall what kind of issues you were facing when updating to 24.04? Otherwise it would be great to move forward with this.
Hi, I am also seeing different results for the same prompt even though temperature is set to 0. Complete sampling parameter is: ``` SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=0,...
We also need to go through all examples and add the right options.
Benchmark tool is also affected.
This needs to be described in api docs as well.
Hi @yhna940 thanks for the contribution! Could you add the same description you added to the PR to serve/docker/README.md and also add a line referring to that section into serve/CONTRIBUTING.md?...
Thanks for flagging this @liaddrori1 @namannandan do you have bandwidth to look at this?