Kirat Pandya
Kirat Pandya
#2813 only covers "same prompt, multiple output", not "multiple prompt, multiple output".
> On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose...
Beyond sampling parameters, the following would be very helpful 1. Prompt token counts: Makes it easier to potentially trim the next request 2. logprobs - Extremely useful for scenarios like...
1. Yes. idea would be to get the actual token counts for prompt and completion (something like this: https://platform.openai.com/docs/api-reference/making-requests) 2. Yes 3. That is fine
+1 Running into this. We run Docling inside a GRPC server which requires a ThreadPoolExecutor, so moving to ProcessPoolExecutor is not an option (at least not a straightforward one)
@ggerganov a bunch of these cool thee toys (speculative exec, beam search) seem to be landing in either main or separate executables in examples. Do you intend to push for...