semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

.Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient

Open stephentoub opened this issue 7 months ago • 6 comments

stephentoub avatar May 20 '25 14:05 stephentoub

The model seems to be loading in the initialization of the client, should happen just in the runtime.

    public OnnxRuntimeGenAIChatClient(string modelPath, OnnxRuntimeGenAIChatClientOptions? options = null)
    {
        //...
        _model = new Model(modelPath);
        _tokenizer = new Tokenizer(_model);
    }

rogerbarreto avatar May 20 '25 17:05 rogerbarreto

The model seems to be loading in the initialization of the client, should happen just in the runtime.

We can, but, why do we want to do that? Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times, and first use will be delayed by a potentially very long time, likely timing out.

stephentoub avatar May 20 '25 17:05 stephentoub

Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.

image image

rogerbarreto avatar May 20 '25 17:05 rogerbarreto

We can, but, why do we want to do that?

Don't want to add behavioral changes to the IChatCompletionService that customers may already be relying into.

Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times.

Currently the UnitTests are failing because of loading the model, I would agree that a fail fast should happen if the file do not exists, but not by loading the model.

Normally for local model usage what we see for instance using Ollama, the model gets loaded during the request time, which is how local model applications have been constructed ultimately.

I would also consider for this Early scenario, having the IChatCompletionService(Model) using the ChatClient(model) ctor.

rogerbarreto avatar May 20 '25 17:05 rogerbarreto

Adding the delaying on the Service implementation side, so it don't necessarily requires a change the original OnnxChatClient impl.

rogerbarreto avatar May 20 '25 17:05 rogerbarreto

Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.

image image

Ugh, I thought 0.8.0 included the update to the stable dependency. We'll need to wait.

stephentoub avatar May 20 '25 18:05 stephentoub

Updated to 0.8.1

stephentoub avatar Jun 03 '25 01:06 stephentoub

One unrelated integration test failed

[xUnit.net 00:03:34.59]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync [FAIL]
[xUnit.net 00:03:34.59]       Assert.NotEmpty() Failure: Collection was empty
[xUnit.net 00:03:34.59]       Stack Trace:
[xUnit.net 00:03:34.59]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/IntegrationTests/Connectors/OpenAI/OpenAIChatCompletion_NonStreamingTests.cs(162,0): at SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync()
[xUnit.net 00:03:34.59]         --- End of stack trace from previous location ---

markwallace-microsoft avatar Jun 10 '25 08:06 markwallace-microsoft

More unrelated integration test failures

[xUnit.net 00:01:22.94]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithAudioInputAndOutputAsync [FAIL]
[xUnit.net 00:01:22.95]       Microsoft.SemanticKernel.HttpOperationException : Service request failed.
[xUnit.net 00:01:22.95]       Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]       
[xUnit.net 00:01:22.95]       ---- System.ClientModel.ClientResultException : Service request failed.
[xUnit.net 00:01:22.95]       Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]       
[xUnit.net 00:01:22.95]       Stack Trace:
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.cs(244,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request)
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs(171,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.GetChatMessageContentsAsync(String targetModel, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatCompletionServiceExtensions.cs(83,0): at Microsoft.SemanticKernel.ChatCompletion.ChatCompletionServiceExtensions.GetChatMessageContentAsync(IChatCompletionService chatCompletionService, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)


[xUnit.net 00:04:05.05]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAITextToAudioTests.OpenAITextToAudioTestAsync [FAIL]
[xUnit.net 00:04:05.05]       System.Threading.Tasks.TaskCanceledException : The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
[xUnit.net 00:04:05.05]       ---- System.TimeoutException : The operation was canceled.
[xUnit.net 00:04:05.05]       -------- System.Threading.Tasks.TaskCanceledException : The operation was canceled.
[xUnit.net 00:04:05.05]       ------------ System.IO.IOException : Unable to read data from the transport connection: Operation canceled.
[xUnit.net 00:04:05.05]       ---------------- System.Net.Sockets.SocketException : Operation canceled

markwallace-microsoft avatar Jun 10 '25 15:06 markwallace-microsoft