[BUG] Summarize fails even when a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF"
Pre-check
- [X] I have searched the existing issues and none cover this bug.
Description
while using summarize i keep getting the below error. I had to fix the summarize_service.py and ui.py to catch it in a good way. here is the returned error 10:30:57.046 [ERROR ] private_gpt.server.recipes.summarize.summarize_service - HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF
It looked to be a backend issue but we can clearly see that the response is created correctly by the Ollama backend
Given the information from multiple sources and not prior knowledge, answer the query. Query: Provide a comprehensive summary of the provided context information. The summary should cover all the key points and main ideas presented in the original text, while also condensing the information into a concise and easy-to-understand format. Please ensure that the summary includes relevant details and examples that support the main ideas, while avoiding any unnecessary information or repetition.
Answer:
** Response: ** assistant: The provided context information outlines various aspects of user management, software usage tracking, and computer inventory in an IT system. Here's a comprehensive summary:
now, i had to fix a lot in the code. fixed all of these:
Summary of the Main Problem
The main problem involved handling asynchronous operations correctly in the summarization service and the UI. Specifically, the issues were:
- The code was having errors due to nested async calls, which required the use of
nest_asyncioto allow nested event loops. - The
_summarizemethod needed to be converted to an async generator to handle streaming responses correctly. For that thestream_summarizeandsummarizemethods needed to use the async generator correctly - Added Proper error handling for exceptions such
asyncio.CancelledError,ResponseError, andStopAsyncIteration. - Fix The
_chatmethod in the UI needed to handle async streaming responses correctly and making sure thatstream_summarizewas used correctly in anasync forloop in the UI.
As i see it the error we have here is due to the following since i have ruled out the server side and networking issues. since we can clearly see that a response is generated my the model
- Timeout: so , if the server takes too long to respond, the client might close the connection which can result in EOF error.
- incorrect query parameters .. so if these are incorrect of malfformed the server will close the connection
Steps to Reproduce
- Rag some bigger docs
- summarize
Expected Behavior
No errors and correct summarization
Actual Behavior
Summarize fails even a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF"
Environment
CUDA12, Ubuntu, Ollama profile
Additional Information
No response
Version
No response
Setup Checklist
- [X] Confirm that you have followed the installation instructions in the project’s documentation.
- [X] Check that you are using the latest version of the project.
- [X] Verify disk space availability for model storage and data processing.
- [X] Ensure that you have the necessary permissions to run the project.
NVIDIA GPU Setup Checklist
- [X] Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
- [X] Ensure an NVIDIA GPU is installed and recognized by the system (run
nvidia-smito verify). - [X] Ensure proper permissions are set for accessing GPU resources.
- [ ] Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)
adding my changes changes_summarize_service.txt changes_ui.txt
so, the summarizeservice is retrieved from the request state using SummarizeService = request.state.injector.get(SummarizeService) adn for the streaming Response uses to_openai_stream to convert the response to a SSE stream. the issue is somewhere here
also tested to increase the timeout value here which did not have any effect.. so query_engine = summary_index.as_query_engine( llm=self.llm_component.llm, response_mode=ResponseMode.TREE_SUMMARIZE, streaming=stream, use_async=self.settings.summarize.use_async, timeout=360 # <------------------Increase timeout to 360 seconds ) as you can see the nested async issue is solved using my code changes attached. we are at least seeing clear logs
- Software management includes three sub-tabs: Applications, Raw Usage, and Active Usage.
- These tabs display information about software applications used by users, such as total usage, active usage, raw usage data, and more.
PART OF THE MODEL RESPONSE:
... Overall, the system provides a comprehensive platform for managing subscriptions, uploaded files, integration platforms, user profiles, and software usage. The system's features are designed to streamline processes, provide insights into user behavior, and support various workflows.
11:37:04.888 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.888 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.891 [ERROR ] private_gpt.server.recipes.summarize.summarize_service - HTTP request failed: POST predict: Post "http://127.0.0.1:33899/completion": EOF 11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.934 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.935 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.935 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error" 11:37:04.974 [INFO ] uvicorn.access - 127.0.0.1:53322 - "POST /run/predict HTTP/1.1" 200
Can you check ollama server logs? The problem is ollama related to, since it's throwing 500. Something with context window, computer resources, etc
any fix for this issue?