Devis Lucato

Results 288 comments of Devis Lucato

> > with batching: is the client sending too many chunks per batch? > > I'm still not sure how I can enable batching in Kernel Memory when running as...

Batch embedding generation ready and released. Thanks @alkampfergit https://github.com/microsoft/kernel-memory/pull/531 ! Quick notes: * batch support added to OpenAI and Azure OpenAI embedding generators * batch size configurable. Default for OpenAI...

Work left before closing: - reproducing 429 - change KM code to surface 429s appropriately. E.g. when calling KM service, if AI internally returns 429, KM web service should return...

^^ it's the same policy implemented here https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureOpenAI/Internals/ClientSequentialRetryPolicy.cs, used both for OpenAI and Azure OpenAI. In case of throttling and 503, KM retries following the delay provided by the remote...

We tried making it async during the initial implementation, but it would affect the speed and complexity of the text chunker that would need quite a bit of rewrite, and...

IIRC Llama uses SentencePiece, anything available in that direction?

@glorious-beard thank you! as soon as I get a chance I'll do some tests 👍

some updates: * Docker image is available, notes in the main README * All settings can be set using env vars, using the usual .NET configuration approach, see Service's README...

Update: I started looking into it and made a few changes in #201 - This will need some more involved work, revisiting how text is extracted. It's doable, but not...

That's correct. Currently the service uses the same model for questions and summarization. You can use a different handler for summarization though, with you custom settings. Plugging in custom handlers...