[BUG] Exception when running an OpenAI streaming model with complex tool parameters
Describe the bug When using the OpenAI streaming model with tool-parameters more complex than a string, the token-estimation system throws an exception. This seems to be caused by the default JSON-parser being hard-coded to decode Map.class as Map<String, String> instead of the default GSON behavior.
Log and Stack trace
Exception in thread "main" java.util.concurrent.CompletionException: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
at java.base/java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:413)
at java.base/java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2118)
at ca.codebuddy.demoapp.Main.main(Main.java:68)
Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
at com.google.gson.Gson.fromJson(Gson.java:1238)
at com.google.gson.Gson.fromJson(Gson.java:1137)
at com.google.gson.Gson.fromJson(Gson.java:1047)
at com.google.gson.Gson.fromJson(Gson.java:1014)
at dev.langchain4j.internal.GsonJsonCodec.fromJson(GsonJsonCodec.java:64)
at dev.langchain4j.internal.Json.fromJson(Json.java:66)
at dev.langchain4j.model.openai.OpenAiTokenizer.countArguments(OpenAiTokenizer.java:334)
at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInToolExecutionRequests(OpenAiTokenizer.java:266)
at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInForcefulToolExecutionRequest(OpenAiTokenizer.java:314)
at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.tokenUsage(OpenAiStreamingResponseBuilder.java:192)
at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.build(OpenAiStreamingResponseBuilder.java:167)
at dev.langchain4j.model.openai.OpenAiStreamingChatModel.lambda$generate$2(OpenAiStreamingChatModel.java:158)
at dev.ai4j.openai4j.StreamingRequestExecutor$2.onEvent(StreamingRequestExecutor.java:170)
at okhttp3.internal.sse.RealEventSource.onEvent(RealEventSource.kt:101)
at okhttp3.internal.sse.ServerSentEventReader.completeEvent(ServerSentEventReader.kt:108)
at okhttp3.internal.sse.ServerSentEventReader.processNextEvent(ServerSentEventReader.kt:52)
at okhttp3.internal.sse.RealEventSource.processResponse(RealEventSource.kt:75)
at okhttp3.internal.sse.RealEventSource.onResponse(RealEventSource.kt:46)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
at com.google.gson.stream.JsonReader.nextString(JsonReader.java:836)
at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:421)
at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:409)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186)
at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:144)
at com.google.gson.Gson.fromJson(Gson.java:1227)
... 21 more
To Reproduce
public static void main(String[] args) {
OpenAiStreamingChatModel chatModel = OpenAiStreamingChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("gpt-3.5-turbo-0125")
.build();
Gson gson = new Gson();
Map<String, Map<String, Object>> properties = gson.fromJson("""
{
"list": {
"type": "array",
"items": {
"type": "string"
},
"description": "The output list"
}
}
""", Map.class);
ToolParameters params = ToolParameters.builder()
.required(Collections.singletonList("list"))
.properties(properties).build();
ToolSpecification spec = ToolSpecification.builder()
.name("list")
.description("List of US presidents")
.parameters(params)
.build();
CompletableFuture<Response<AiMessage>> future = new CompletableFuture<>();
List<ChatMessage> messages = Collections.singletonList(new UserMessage("List the US presidents"));
chatModel.generate(messages, spec, new StreamingResponseHandler<>() {
@Override
public void onNext(String s) {
System.out.println("Next: " + s);
}
@Override
public void onError(Throwable throwable) {
future.completeExceptionally(throwable);
}
@Override
public void onComplete(Response<AiMessage> response) {
future.complete(response);
}
});
final Response<AiMessage> response = future.join();
System.out.println(response.content().toolExecutionRequests().get(0).arguments());;
}
Expected behavior There should not be an exception. It works fine with the non-streaming model
Please complete the following information:
- LangChain4j version: 0.26.1
- Java version: 17
- Spring Boot version (if applicable): none
Additional context Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution.
@Crokoking thanks a lot for reporting!
did it be fixed?
langchain4j
can you help me in below code, I am getting same error, what should i change ?
How can i replace tokenizer ? (Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution).
@Bean Tokenizer tokenizer() { return new OpenAiTokenizer(MODEL_NAME); }
@Bean CommandLineRunner ingestDocsForLangChain( EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore, Tokenizer tokenizer, ResourceLoader resourceLoader ) throws IOException { return args -> { Resource resource = resourceLoader.getResource("classpath:service.txt"); var service = loadDocument(resource.getFile().toPath(), new TextDocumentParser());
DocumentSplitter documentSplitter = DocumentSplitters.recursive(200, 0,
tokenizer);
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.documentSplitter(documentSplitter)
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
ingestor.ingest(List.of(service));
}
}
I made a copy of the OpenAiTokenizer class, added a gson field, and replaced the two calls to Json.fromJson() with calls to gson.fromJson(). Then i just set that new class as tokenizer when creating my ChatModel