Support GPT4All Server API
Describtion of solution:
I want to use JabRef's AI feature locally. There are multiple applications out there that provide a server API. They very often offer an API that resembles OpenAI API.
GPT4All is such an application. Others are Llama.cpp, Ollama, LMStudio, Jan, KobolCPP. I am sure, there are more, but those are the most well known ones.
The grand advantage of those applications is that they offer more samplers, GPU acceleration, hardware support and support for models that have not been added to JabRef.
Problem
It kinda works with GPT4All already, but something is wrong. I believe the embeddings are not sent together with the prompt and responses look like they are cutoff in the middle.
GPT4All:
JabRef:
Additional context
- GPT4All documentation for local API server
- The phi-3.1-mini-128k-instruct model can be downloaded here: https://huggingface.co/GPT4All-Community/Phi-3.1-mini-128k-instruct-GGUF/tree/main. Just move the model file into the model directory of GPT4All and then configure it in the model settings as shown in my screenshots down below.
- Documentation about how to configure other custom models: https://github.com/nomic-ai/gpt4all/wiki/Configuring-Custom-Models.
JabRef preferences:
GPT4All preferences:
GPT4All model settings 1:
GPT4All model settings 2:
I also often get those errors/warnings in the commandline, when I try to send messages when connected to GPT4All server API. Not sure, if related.
2024-10-01 13:13:53 [pool-2-thread-4] org.jabref.logic.ai.chatting.AiChatLogic.execute()
INFO: Sending message to AI provider (https://api.openai.com/v1) for answering in entry CooperEtAl200708cah: What are the authors of the paper?
2024-10-01 13:13:53 [JavaFX Application Thread] org.jabref.gui.ai.components.aichat.AiChatComponent.lambda$onSendMessage$11()
ERROR: Got an error while sending a message to AI: io.github.stefanbratanov.jvm.openai.OpenAIException: 400 - message: Invalid 'messages[2].role': did not expect 'user' here, type: invalid_request_error, param: null, code: null
at [email protected]/io.github.stefanbratanov.jvm.openai.OpenAIClient.lambda$validateHttpResponse$6(OpenAIClient.java:129)
at java.base/java.util.Optional.ifPresentOrElse(Optional.java:196)
at [email protected]/io.github.stefanbratanov.jvm.openai.OpenAIClient.validateHttpResponse(OpenAIClient.java:127)
at [email protected]/io.github.stefanbratanov.jvm.openai.OpenAIClient.sendHttpRequest(OpenAIClient.java:85)
at [email protected]/io.github.stefanbratanov.jvm.openai.OpenAIClient.sendHttpRequest(OpenAIClient.java:78)
at [email protected]/io.github.stefanbratanov.jvm.openai.ChatClient.createChatCompletion(ChatClient.java:37)
at [email protected]/org.jabref.logic.ai.chatting.model.JvmOpenAiChatLanguageModel.generate(JvmOpenAiChatLanguageModel.java:65)
at [email protected]/org.jabref.logic.ai.chatting.model.JabRefChatLanguageModel.generate(JabRefChatLanguageModel.java:142)
at [email protected]/dev.langchain4j.chain.ConversationalRetrievalChain.execute(ConversationalRetrievalChain.java:85)
at [email protected]/dev.langchain4j.chain.ConversationalRetrievalChain.execute(ConversationalRetrievalChain.java:32)
at [email protected]/org.jabref.logic.ai.chatting.AiChatLogic.execute(AiChatLogic.java:168)
at [email protected]/org.jabref.gui.ai.components.aichat.AiChatComponent.lambda$onSendMessage$9(AiChatComponent.java:204)
at [email protected]/org.jabref.logic.util.BackgroundTask$1.call(BackgroundTask.java:73)
at [email protected]/org.jabref.gui.util.UiTaskExecutor$1.call(UiTaskExecutor.java:191)
at javafx.graphics@23/javafx.concurrent.Task$TaskCallable.call(Task.java:1401)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
I think there is also an issue when switching models.
Even after clearing the chat history, it is not possible to get around this erorr message, unless switching to a different entry in JabRef. Then I can chat again.
@InAnYan
Such an interesting issue and such an interesting behaviour...
IMHO it's very wrong that there is no OpenAI API compatible mode in gpt4all. (Standardization made people's life much easier).
However, I will look into this issue in more detail, because Gpt4All is a popular app and this is also important
This is so weird. I tried it last night and encountered the problem mentioned above, that is, the returned content would be truncated. But I tried it again this afternoon and found that it could be returned normally without truncating the content.
Ok... I reproduced it. I will try to fix it.
Oh, @FeiLi-lab, @ThiloteE when you have the issue with truncated output, could you try to click on text area?
Because there is some bug in the UI, when text are is not expanded. Could this be the case of truncated output?
I will try to have a look at this on the weekend.
Maybe could have been https://github.com/ggerganov/llama.cpp/pull/9867 in upstream llama.cpp too. The fix would need some time to reach downstream GPT4All.
I am also working on this issue. I think the problem that responses look like cutoff in the middle may come from the request.
I ran the following two commands on my computer, one with max_token set and one without, and the result shows that the answer without max_token set was cutoff.
curl -X POST http://localhost:4891/v1/chat/completions -H "Content-Type: application/json" -d "{"model": "Phi-3.1-mini-128k-instruct-Q4_0-precise-output-tensor", "messages": [{"role": "user", "content": "could you please introduce more about your self?"}], "max_tokens": 2048, "temperature": 0.7}"
curl -X POST http://localhost:4891/v1/chat/completions -H "Content-Type: application/json" -d "{"model": "Phi-3.1-mini-128k-instruct-Q4_0-precise-output-tensor", "messages": [{"role": "user", "content": "could you please introduce more about your self?"}], "temperature": 0.7}"
I set the max_token in the code, now the response looks complete.
preference:
JabRef:
@ThiloteE I will refine the code later if you can review it.
Oh, nice! Good to know. Yes, a pull-request would be nice, otherwise nobody can review. Do you think this is something we would need to add to the preferences?
Welcome to the vibrant world of open-source development with JabRef!
Newcomers, we're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.
In case you encounter failing tests during development, please check our developer FAQs!
Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.
⚠ Note that this issue will become unassigned if it isn't closed within 30 days.
🔧 A maintainer can also add the Pinned label to prevent it from being unassigned automatically.
Happy coding! 🚀