.Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient
The model seems to be loading in the initialization of the client, should happen just in the runtime.
public OnnxRuntimeGenAIChatClient(string modelPath, OnnxRuntimeGenAIChatClientOptions? options = null)
{
//...
_model = new Model(modelPath);
_tokenizer = new Tokenizer(_model);
}
The model seems to be loading in the initialization of the client, should happen just in the runtime.
We can, but, why do we want to do that? Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times, and first use will be delayed by a potentially very long time, likely timing out.
Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.
We can, but, why do we want to do that?
Don't want to add behavioral changes to the IChatCompletionService that customers may already be relying into.
Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times.
Currently the UnitTests are failing because of loading the model, I would agree that a fail fast should happen if the file do not exists, but not by loading the model.
Normally for local model usage what we see for instance using Ollama, the model gets loaded during the request time, which is how local model applications have been constructed ultimately.
I would also consider for this Early scenario, having the IChatCompletionService(Model) using the ChatClient(model) ctor.
Adding the delaying on the Service implementation side, so it don't necessarily requires a change the original OnnxChatClient impl.
Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.
![]()
Ugh, I thought 0.8.0 included the update to the stable dependency. We'll need to wait.
Updated to 0.8.1
One unrelated integration test failed
[xUnit.net 00:03:34.59] SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync [FAIL]
[xUnit.net 00:03:34.59] Assert.NotEmpty() Failure: Collection was empty
[xUnit.net 00:03:34.59] Stack Trace:
[xUnit.net 00:03:34.59] /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/IntegrationTests/Connectors/OpenAI/OpenAIChatCompletion_NonStreamingTests.cs(162,0): at SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync()
[xUnit.net 00:03:34.59] --- End of stack trace from previous location ---
More unrelated integration test failures
[xUnit.net 00:01:22.94] SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithAudioInputAndOutputAsync [FAIL]
[xUnit.net 00:01:22.95] Microsoft.SemanticKernel.HttpOperationException : Service request failed.
[xUnit.net 00:01:22.95] Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]
[xUnit.net 00:01:22.95] ---- System.ClientModel.ClientResultException : Service request failed.
[xUnit.net 00:01:22.95] Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]
[xUnit.net 00:01:22.95] Stack Trace:
[xUnit.net 00:01:22.95] /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.cs(244,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request)
[xUnit.net 00:01:22.95] /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs(171,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.GetChatMessageContentsAsync(String targetModel, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
[xUnit.net 00:01:22.95] /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatCompletionServiceExtensions.cs(83,0): at Microsoft.SemanticKernel.ChatCompletion.ChatCompletionServiceExtensions.GetChatMessageContentAsync(IChatCompletionService chatCompletionService, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
[xUnit.net 00:04:05.05] SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAITextToAudioTests.OpenAITextToAudioTestAsync [FAIL]
[xUnit.net 00:04:05.05] System.Threading.Tasks.TaskCanceledException : The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
[xUnit.net 00:04:05.05] ---- System.TimeoutException : The operation was canceled.
[xUnit.net 00:04:05.05] -------- System.Threading.Tasks.TaskCanceledException : The operation was canceled.
[xUnit.net 00:04:05.05] ------------ System.IO.IOException : Unable to read data from the transport connection: Operation canceled.
[xUnit.net 00:04:05.05] ---------------- System.Net.Sockets.SocketException : Operation canceled
