langchain4j [BUG] Exception when running an OpenAI streaming model with complex tool parameters

Describe the bug When using the OpenAI streaming model with tool-parameters more complex than a string, the token-estimation system throws an exception. This seems to be caused by the default JSON-parser being hard-coded to decode Map.class as Map<String, String> instead of the default GSON behavior.

Log and Stack trace

Exception in thread "main" java.util.concurrent.CompletionException: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at java.base/java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:413)
	at java.base/java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2118)
	at ca.codebuddy.demoapp.Main.main(Main.java:68)
Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.Gson.fromJson(Gson.java:1238)
	at com.google.gson.Gson.fromJson(Gson.java:1137)
	at com.google.gson.Gson.fromJson(Gson.java:1047)
	at com.google.gson.Gson.fromJson(Gson.java:1014)
	at dev.langchain4j.internal.GsonJsonCodec.fromJson(GsonJsonCodec.java:64)
	at dev.langchain4j.internal.Json.fromJson(Json.java:66)
	at dev.langchain4j.model.openai.OpenAiTokenizer.countArguments(OpenAiTokenizer.java:334)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInToolExecutionRequests(OpenAiTokenizer.java:266)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInForcefulToolExecutionRequest(OpenAiTokenizer.java:314)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.tokenUsage(OpenAiStreamingResponseBuilder.java:192)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.build(OpenAiStreamingResponseBuilder.java:167)
	at dev.langchain4j.model.openai.OpenAiStreamingChatModel.lambda$generate$2(OpenAiStreamingChatModel.java:158)
	at dev.ai4j.openai4j.StreamingRequestExecutor$2.onEvent(StreamingRequestExecutor.java:170)
	at okhttp3.internal.sse.RealEventSource.onEvent(RealEventSource.kt:101)
	at okhttp3.internal.sse.ServerSentEventReader.completeEvent(ServerSentEventReader.kt:108)
	at okhttp3.internal.sse.ServerSentEventReader.processNextEvent(ServerSentEventReader.kt:52)
	at okhttp3.internal.sse.RealEventSource.processResponse(RealEventSource.kt:75)
	at okhttp3.internal.sse.RealEventSource.onResponse(RealEventSource.kt:46)
	at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.stream.JsonReader.nextString(JsonReader.java:836)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:421)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:409)
	at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:144)
	at com.google.gson.Gson.fromJson(Gson.java:1227)
	... 21 more

To Reproduce

public static void main(String[] args) {
        OpenAiStreamingChatModel chatModel = OpenAiStreamingChatModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("gpt-3.5-turbo-0125")
            .build();

        Gson gson = new Gson();
        Map<String, Map<String, Object>> properties = gson.fromJson("""
            {
                "list": {
                    "type": "array",
                    "items": {
                      "type": "string"
                    },
                    "description": "The output list"
                }
            }
            """, Map.class);


        ToolParameters params = ToolParameters.builder()
            .required(Collections.singletonList("list"))
            .properties(properties).build();
        ToolSpecification spec = ToolSpecification.builder()
            .name("list")
            .description("List of US presidents")
            .parameters(params)
            .build();
        CompletableFuture<Response<AiMessage>> future = new CompletableFuture<>();

        List<ChatMessage> messages = Collections.singletonList(new UserMessage("List the US presidents"));
        chatModel.generate(messages, spec, new StreamingResponseHandler<>() {

            @Override
            public void onNext(String s) {
                System.out.println("Next: " + s);
            }

            @Override
            public void onError(Throwable throwable) {
                future.completeExceptionally(throwable);
            }

            @Override
            public void onComplete(Response<AiMessage> response) {
                future.complete(response);
            }
        });
        final Response<AiMessage> response = future.join();
        System.out.println(response.content().toolExecutionRequests().get(0).arguments());;
    }

Expected behavior There should not be an exception. It works fine with the non-streaming model

Please complete the following information:

LangChain4j version: 0.26.1
Java version: 17
Spring Boot version (if applicable): none

Additional context Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution.

Feb 04 '24 12:02 Crokoking

@Crokoking thanks a lot for reporting!

Feb 23 '24 17:02 langchain4j

did it be fixed?

langchain4j

Mar 26 '24 09:03 cslcsl490

can you help me in below code, I am getting same error, what should i change ?

How can i replace tokenizer ? (Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution).

@Bean Tokenizer tokenizer() { return new OpenAiTokenizer(MODEL_NAME); }

@Bean CommandLineRunner ingestDocsForLangChain( EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore, Tokenizer tokenizer, ResourceLoader resourceLoader ) throws IOException { return args -> { Resource resource = resourceLoader.getResource("classpath:service.txt"); var service = loadDocument(resource.getFile().toPath(), new TextDocumentParser());

        DocumentSplitter documentSplitter = DocumentSplitters.recursive(200, 0,
                tokenizer);

        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(documentSplitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(List.of(service));
        }

}

Apr 05 '24 20:04 deepakn27

I made a copy of the OpenAiTokenizer class, added a gson field, and replaced the two calls to Json.fromJson() with calls to gson.fromJson(). Then i just set that new class as tokenizer when creating my ChatModel

Apr 05 '24 20:04 Crokoking