kernel-memory
kernel-memory copied to clipboard
Support ChatCompletion streaming over Completions
The OpenAI/AzureOpenAI TextGeneration
classes use the legacy Completions API (deprecation from OpenAI, announcement and Azure model compatibility).
This poses a challenge when working with Azure OpenAI Service in particular as you are required to deploy a gpt-35-turbo
model version of 0301
, which is deprecated, but due to quota limits on standard accounts you are unlikely to be able to deploy that plus the embeddings model and a 0613
model for gpt-35-turbo
for the application to use.
Having Semantic Memory move off the Completions API to ChatCompletions would unblock the usage in AOAI applications and ensure that applications aren't caught in the upcoming deprecations.