langchain4j icon indicating copy to clipboard operation
langchain4j copied to clipboard

[BUG] Exception when running an OpenAI streaming model with complex tool parameters

Open Crokoking opened this issue 2 years ago • 2 comments

Describe the bug When using the OpenAI streaming model with tool-parameters more complex than a string, the token-estimation system throws an exception. This seems to be caused by the default JSON-parser being hard-coded to decode Map.class as Map<String, String> instead of the default GSON behavior.

Log and Stack trace

Exception in thread "main" java.util.concurrent.CompletionException: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at java.base/java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:413)
	at java.base/java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2118)
	at ca.codebuddy.demoapp.Main.main(Main.java:68)
Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.Gson.fromJson(Gson.java:1238)
	at com.google.gson.Gson.fromJson(Gson.java:1137)
	at com.google.gson.Gson.fromJson(Gson.java:1047)
	at com.google.gson.Gson.fromJson(Gson.java:1014)
	at dev.langchain4j.internal.GsonJsonCodec.fromJson(GsonJsonCodec.java:64)
	at dev.langchain4j.internal.Json.fromJson(Json.java:66)
	at dev.langchain4j.model.openai.OpenAiTokenizer.countArguments(OpenAiTokenizer.java:334)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInToolExecutionRequests(OpenAiTokenizer.java:266)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInForcefulToolExecutionRequest(OpenAiTokenizer.java:314)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.tokenUsage(OpenAiStreamingResponseBuilder.java:192)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.build(OpenAiStreamingResponseBuilder.java:167)
	at dev.langchain4j.model.openai.OpenAiStreamingChatModel.lambda$generate$2(OpenAiStreamingChatModel.java:158)
	at dev.ai4j.openai4j.StreamingRequestExecutor$2.onEvent(StreamingRequestExecutor.java:170)
	at okhttp3.internal.sse.RealEventSource.onEvent(RealEventSource.kt:101)
	at okhttp3.internal.sse.ServerSentEventReader.completeEvent(ServerSentEventReader.kt:108)
	at okhttp3.internal.sse.ServerSentEventReader.processNextEvent(ServerSentEventReader.kt:52)
	at okhttp3.internal.sse.RealEventSource.processResponse(RealEventSource.kt:75)
	at okhttp3.internal.sse.RealEventSource.onResponse(RealEventSource.kt:46)
	at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.stream.JsonReader.nextString(JsonReader.java:836)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:421)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:409)
	at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:144)
	at com.google.gson.Gson.fromJson(Gson.java:1227)
	... 21 more

To Reproduce

public static void main(String[] args) {
        OpenAiStreamingChatModel chatModel = OpenAiStreamingChatModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("gpt-3.5-turbo-0125")
            .build();

        Gson gson = new Gson();
        Map<String, Map<String, Object>> properties = gson.fromJson("""
            {
                "list": {
                    "type": "array",
                    "items": {
                      "type": "string"
                    },
                    "description": "The output list"
                }
            }
            """, Map.class);


        ToolParameters params = ToolParameters.builder()
            .required(Collections.singletonList("list"))
            .properties(properties).build();
        ToolSpecification spec = ToolSpecification.builder()
            .name("list")
            .description("List of US presidents")
            .parameters(params)
            .build();
        CompletableFuture<Response<AiMessage>> future = new CompletableFuture<>();

        List<ChatMessage> messages = Collections.singletonList(new UserMessage("List the US presidents"));
        chatModel.generate(messages, spec, new StreamingResponseHandler<>() {

            @Override
            public void onNext(String s) {
                System.out.println("Next: " + s);
            }

            @Override
            public void onError(Throwable throwable) {
                future.completeExceptionally(throwable);
            }

            @Override
            public void onComplete(Response<AiMessage> response) {
                future.complete(response);
            }
        });
        final Response<AiMessage> response = future.join();
        System.out.println(response.content().toolExecutionRequests().get(0).arguments());;
    }

Expected behavior There should not be an exception. It works fine with the non-streaming model

Please complete the following information:

  • LangChain4j version: 0.26.1
  • Java version: 17
  • Spring Boot version (if applicable): none

Additional context Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution.

Crokoking avatar Feb 04 '24 12:02 Crokoking

@Crokoking thanks a lot for reporting!

langchain4j avatar Feb 23 '24 17:02 langchain4j

did it be fixed?

langchain4j

cslcsl490 avatar Mar 26 '24 09:03 cslcsl490

can you help me in below code, I am getting same error, what should i change ?

How can i replace tokenizer ? (Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution).

@Bean Tokenizer tokenizer() { return new OpenAiTokenizer(MODEL_NAME); }

@Bean CommandLineRunner ingestDocsForLangChain( EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore, Tokenizer tokenizer, ResourceLoader resourceLoader ) throws IOException { return args -> { Resource resource = resourceLoader.getResource("classpath:service.txt"); var service = loadDocument(resource.getFile().toPath(), new TextDocumentParser());

        DocumentSplitter documentSplitter = DocumentSplitters.recursive(200, 0,
                tokenizer);

        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(documentSplitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(List.of(service));
        }

}

deepakn27 avatar Apr 05 '24 20:04 deepakn27

I made a copy of the OpenAiTokenizer class, added a gson field, and replaced the two calls to Json.fromJson() with calls to gson.fromJson(). Then i just set that new class as tokenizer when creating my ChatModel

Crokoking avatar Apr 05 '24 20:04 Crokoking