ragflow [Question]: parsing in stuck when apply

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

I deployed an embedding model locally and connected it to ragflow through OpenAI-API-Compatible, but the parsing stuck consistently, as shown in the figure

and the logs:

The embedding service did not receive a request, and the only log was when it was added through OpenAI-API-Compatible;

Jun 18 '25 06:06 Jwpap

Hi @Jwpap! I'm Dosu and I’m helping the ragflow team.

Parsing getting stuck with a local OpenAI-API-Compatible embedding model is often due to configuration or registration issues. Here are the most likely causes and what you can check:

Make sure your embedding model is registered in conf/llm_factories.json with the correct format and that the backend recognizes it. The model identifier must match the <model_name>@OpenAI-API-Compatible pattern, and both parts must be non-empty strings. If the model isn't registered or the backend doesn't recognize the factory/provider, parsing won't proceed and your embedding service won't receive requests (details) (setup).
If you're on RAGFlow v0.19.0, there was a bug that prevented some custom models from working. This has been fixed in a later version. You can either update to a newer version or manually patch the code in your Docker container (edit api/utils/api_utils.py to include your model, then restart the container) (reference).
Double-check your embedding model's base_url and network connectivity. If RagFlow can't reach your local embedding service, no requests will be sent. Sometimes, using 127.0.0.1 or localhost doesn't work inside Docker; try using the host's internal IP (e.g., 172.17.0.1:your_port) (example).
Make sure the task executor is running. In the web UI, check under System for the task executor's health status, or look for regular 'reported heartbeat' messages in the logs. If the executor is down, restart it (troubleshooting).
If your embedding API's structure is different from what RagFlow expects (input: string or list of strings, output: embedding(s) in a predictable field), you may need to write a small adapter service (integration details).

If you check these areas and parsing is still stuck, please share your llm_factories.json entry and any error logs you see in the RagFlow server or task executor logs.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 18 '25 06:06 dosubot[bot]

after annotating the code marked in red, it can be parsed normally。 the entrypoint.sh is used for what?

Jun 19 '25 02:06 Jwpap

The entrypoint.sh script is the main startup script for the RAGFlow Docker container—it configures and launches all the core services inside the container, like the web server, task executor, and (optionally) the MCP server. It also processes environment variables and command-line arguments to control which components run and how they're configured. When the container starts, Docker runs this script automatically as defined in the Dockerfile's ENTRYPOINT directive reference.

If you commented out (annotated) the volume mount for entrypoint.sh in your docker-compose.yml and parsing started working, it's likely because your local copy of entrypoint.sh was either outdated or mismatched with the version inside the container. Overriding the container's built-in script with a local version can prevent critical background services (like the task executor, which handles parsing) from starting correctly, causing the system to hang. By removing the override, the container uses its own, version-matched entrypoint.sh, which ensures all required services start as expected reference.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 19 '25 02:06 dosubot[bot]

@dosu another question, I am currently using the official image. If I modify the local code and want to start the service with the local code, what should I do

Jun 19 '25 02:06 Jwpap

You have two main options:

If you want to quickly test code changes, you can exec into the running container, modify the code (for example, with docker exec -it ragflow-server bash and editing files), then restart the container to apply the changes. This works for small tweaks but isn't ideal for ongoing development or version control reference.
For full development, it's better to run RAGFlow from source on your machine. Clone the repo, install dependencies, launch the required services (like MySQL, Redis, etc.) with Docker Compose, and run the backend/frontend locally. This way, any code changes you make are immediately reflected. There’s a step-by-step guide here: https://ragflow.io/docs/dev/launch_ragflow_from_source reference.

If you use Docker Compose, you can also mount your local code directory into the container as a volume (in the volumes section), so the container uses your local files instead of the ones baked into the image. Just make sure the code inside the container matches your local changes and restart the service after edits.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 19 '25 02:06 dosubot[bot]

@dosu I want to deploy local services in the Ubuntu 20.04 container, but I encountered an issue where pyicu==2.15 cannot be installed when using UV Sync -- Python 3.10. Possible reason is that Ubuntu 20.04 version does not support installing pyicu==2.15. Is there a solution？

Jun 20 '25 08:06 Jwpap

This is a known issue: pyicu==2.15 expects a newer ICU library than what Ubuntu 20.04 provides by default. The system ICU (libicu) on Ubuntu 20.04 is too old, which causes the build to fail with Python 3.10 and recent pyicu versions.

To fix it, you need to upgrade the ICU library to at least version 75. The recommended steps are:

Download and build ICU from source:

wget https://github.com/unicode-org/icu/releases/download/release-76-1/icu4c-76_1-src.tgz
tar -xzf icu4c-76_1-src.tgz
cd icu/source
./configure --prefix=/opt/icu
make -j
sudo make install

Set environment variables so PyICU can find the new ICU:

export LDFLAGS="-L/opt/icu/lib"
export CPPFLAGS="-I/opt/icu/include"
export LD_LIBRARY_PATH="/opt/icu/lib"

Then install PyICU:

pip install pyicu

or

sudo ICU_VERSION=76 pip install pyicu

This approach has resolved the same issue for others on Ubuntu 20.04 with Python 3.10 and pyicu 2.15. More details and discussion are available in the related issues: issue 8110, issue 5589.

If you still have trouble, make sure you have build tools and pkg-config installed (sudo apt-get install build-essential pkg-config).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 20 '25 08:06 dosubot[bot]

@dosu. python3 api/ragflow_server.py meet ValueError: not find model file path /workspace/ragflow/rag/res/deepdoc/det.onnx

Jun 20 '25 09:06 Jwpap

The error means RAGFlow can't find the required OCR model file det.onnx in /workspace/ragflow/rag/res/deepdoc/. By default, RAGFlow tries to auto-download this file from HuggingFace (InfiniFlow/deepdoc). If your server can't access huggingface.co (due to firewall, proxy, or network limits), the download fails and you get this error.

To fix it:

Make sure your server/container has network access to huggingface.co or hf-mirror.com, then restart the service and it should auto-download the file.
If network access isn't possible, manually download det.onnx from https://huggingface.co/InfiniFlow/deepdoc and place it in /workspace/ragflow/rag/res/deepdoc/ inside your container or working directory.
After the file is in place, restart ragflow_server.py and the error should be resolved.

This is required for all visual document parsing features in RAGFlow and is a documented setup step for air-gapped or restricted environments.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 20 '25 09:06 dosubot[bot]

@dosu anothor question

Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

import nltk nltk.download('punkt_tab')

Jun 20 '25 09:06 Jwpap

This error means the NLTK "punkt" tokenizer data is missing. You can fix it by running:

import nltk
nltk.download('punkt')

This will download the required resource. Make sure your environment has internet access. If you want to download all NLTK data at once, you can use:

python -m nltk.downloader all

If you need to specify a custom download directory, use:

nltk.download('punkt', download_dir='/your/path/to/nltk_data')

More details are in the related discussion here.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 20 '25 09:06 dosubot[bot]

@dosu is there a way to deploy local services in docker in docker

Jun 25 '25 07:06 Jwpap

RAGFlow doesn't use traditional Docker-in-Docker (DinD), where you run a full Docker daemon inside a container. Instead, it uses a privileged service called sandbox-executor-manager, which mounts the host's Docker socket into the container. This lets the service launch and manage additional containers on the host for sandboxed code execution, but it doesn't start a nested Docker daemon or support general-purpose DinD workflows (reference, reference).

If you want to deploy local services (like embedding models, databases, etc.) alongside RAGFlow, the recommended way is to use Docker Compose to define all your services in one file and run them together. This is simpler, safer, and fully supported.

If you have a specific use case that really needs DinD (like running containers inside containers for CI/CD), let me know your scenario—there may be a workaround, but it's not the standard or recommended approach for RAGFlow.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 25 '25 07:06 dosubot[bot]

@dosu execute the commad "LD_PRELOAD=$JEMALLOC_PATH python rag/svr/task_executor.py 1", I meet the following error: 2025-06-25 16:22:56,846 INFO 165 task_executor_1 log path: /workspace/ragflow/logs/task_executor_1.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-06-25 16:22:56,847 INFO 165

/_ /_ / / / / _____ _______ / / _____ / / / __ `/ / /// / __/ | |// _ / / / / / __/ __ / / / / / // ( ) ,< / /> </ / // // / // // / / // _,///|| /_____//||_/_/_,/_/___//

fatal: not a git repository (or any parent up to mount point /) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). 2025-06-25 16:22:56,849 INFO 165 TaskExecutor: RAGFlow version: unknown full 2025-06-25 16:22:56,850 WARNING 165 SECURITY WARNING: Using auto-generated SECRET_KEY. Generated key: 3ac5f1612ecb5e8aa41ff28a368d81389116046fe0e3bdf45491e3ed4b892bf8 2025-06-25 16:22:56,850 INFO 165 Use Elasticsearch http://localhost:1200 as the doc engine. 2025-06-25 16:22:56,852 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.001s] 2025-06-25 16:22:56,852 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 1 times in a row, putting on 1 second timeout 2025-06-25 16:22:56,852 WARNING 165 Retrying request after failure (attempt 0 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad72fd660>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:22:56,854 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:22:56,854 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 2 times in a row, putting on 2 second timeout 2025-06-25 16:22:56,854 WARNING 165 Retrying request after failure (attempt 1 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad72ff250>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:22:56,855 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:22:56,855 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 3 times in a row, putting on 4 second timeout 2025-06-25 16:22:56,855 WARNING 165 Retrying request after failure (attempt 2 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad72ff0d0>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:22:56,855 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:22:56,855 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 4 times in a row, putting on 8 second timeout 2025-06-25 16:22:56,855 WARNING 165 Connection error caused by: ConnectionError(Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad72fd660>: Failed to establish a new connection: [Errno 111] Connection refused)). Waiting Elasticsearch http://localhost:1200 to be healthy. 2025-06-25 16:23:01,861 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.001s] 2025-06-25 16:23:01,862 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 1 times in a row, putting on 1 second timeout 2025-06-25 16:23:01,862 WARNING 165 Retrying request after failure (attempt 0 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e9aa10>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:01,862 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:01,862 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 2 times in a row, putting on 2 second timeout 2025-06-25 16:23:01,862 WARNING 165 Retrying request after failure (attempt 1 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e98340>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:01,863 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:01,863 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 3 times in a row, putting on 4 second timeout 2025-06-25 16:23:01,863 WARNING 165 Retrying request after failure (attempt 2 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad72ffca0>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:01,864 INFO 165 GET http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:01,864 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 4 times in a row, putting on 8 second timeout 2025-06-25 16:23:01,864 WARNING 165 Connection error caused by: ConnectionError(Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e9aa10>: Failed to establish a new connection: [Errno 111] Connection refused)). Waiting Elasticsearch http://localhost:1200 to be healthy. 2025-06-25 16:23:06,869 INFO 165 Resurrected node <Urllib3HttpNode(http://localhost:1200)> (force=False) 2025-06-25 16:23:06,870 INFO 165 HEAD http://localhost:1200/ [status:N/A duration:0.001s] 2025-06-25 16:23:06,870 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 5 times in a row, putting on 16 second timeout 2025-06-25 16:23:06,870 WARNING 165 Retrying request after failure (attempt 0 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e9a5f0>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:06,870 INFO 165 Resurrected node <Urllib3HttpNode(http://localhost:1200)> (force=False) 2025-06-25 16:23:06,871 INFO 165 HEAD http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:06,871 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 6 times in a row, putting on 30 second timeout 2025-06-25 16:23:06,871 WARNING 165 Retrying request after failure (attempt 1 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e9b670>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:06,871 INFO 165 Resurrected node <Urllib3HttpNode(http://localhost:1200)> (force=False) 2025-06-25 16:23:06,872 INFO 165 HEAD http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:06,872 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 7 times in a row, putting on 30 second timeout 2025-06-25 16:23:06,872 WARNING 165 Retrying request after failure (attempt 2 of 3) Traceback (most recent call last): File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_transport.py", line 328, in perform_request meta, raw_data = node.perform_request( File "/workspace/ragflow/.venv/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7fcad6e98c70>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 16:23:06,872 INFO 165 HEAD http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 16:23:06,872 WARNING 165 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 8 times in a row, putting on 30 second timeout 2025-06-25 16:23:06,872 ERROR 165 Elasticsearch http://localhost:1200 is unhealthy in 120s. Traceback (most recent call last): File "/workspace/ragflow/rag/svr/task_executor.py", line 777, in trio.run(main) File "/workspace/ragflow/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 2423, in run raise runner.main_task_outcome.error File "/workspace/ragflow/rag/svr/task_executor.py", line 753, in main settings.init_settings() File "/workspace/ragflow/api/settings.py", line 163, in init_settings docStoreConn = rag.utils.es_conn.ESConnection() File "/workspace/ragflow/rag/utils/init.py", line 31, in _singleton instances[key] = cls(*args, **kw) File "/workspace/ragflow/rag/utils/es_conn.py", line 63, in init raise Exception(msg) Exception: Elasticsearch http://localhost:1200 is unhealthy in 120s.

Jun 25 '25 08:06 Jwpap

The error means RAGFlow can't connect to Elasticsearch at http://localhost:1200—either Elasticsearch isn't running, isn't listening on that port, or the config is pointing to the wrong place. This is a hard requirement: if ES isn't reachable, task_executor.py will keep retrying and eventually fail with "unhealthy in 120s" details.

Here's what to check:

Make sure Elasticsearch is running and listening on port 1200. If you're using Docker Compose, confirm the ES container is up and healthy (docker ps or docker-compose ps).
Check your config files: .env, conf/service_conf.yaml, and docker/service_conf.yaml.template. The ES host/port in these files must match where Elasticsearch is actually running. By default, it's http://localhost:1200, but in Docker Compose setups, it might be something like http://es01:1200 example config.
If ES is running in another container, make sure the network settings allow access from the RAGFlow container. Try curl http://localhost:1200 (or the configured address) from inside the RAGFlow container to verify connectivity.
If you change the config, restart both Elasticsearch and RAGFlow services.
If ES is running but still not accepting connections, check the Elasticsearch logs for errors.

If you need to point RAGFlow to an external or differently-named ES instance, update the host in your config files and restart reference.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 25 '25 08:06 dosubot[bot]

@dosu ES is running in another container, curl http://172.17.0.1:1200 is ok, but i still meet the error: elastic_transport.ConnectionError: Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f54d89b7460>: Failed to establish a new connection: [Errno 111] Connection refused) 2025-06-25 17:32:13,091 INFO 716 HEAD http://localhost:1200/ [status:N/A duration:0.000s] 2025-06-25 17:32:13,091 WARNING 716 Node <Urllib3HttpNode(http://localhost:1200)> has failed for 8 times in a row, putting on 30 second timeout 2025-06-25 17:32:13,091 ERROR 716 Elasticsearch http://localhost:1200 is unhealthy in 120s.

Jun 25 '25 09:06 Jwpap

RAGFlow is still trying to connect to http://localhost:1200 because that's the default in its config files. Inside a Docker container, 'localhost' refers to the container itself, not the host or other containers. Since your Elasticsearch is reachable at http://172.17.0.1:1200, you need to update the RAGFlow config to use this address.

If you're running with Docker Compose, edit your docker/.env file and set:

ES_HOST=172.17.0.1
ES_PORT=1200

Then, make sure your docker/service_conf.yaml.template uses these variables (it should by default: hosts: 'http://${ES_HOST:-es01}:9200', so adjust the port if needed).

If you're running locally or not using Compose, update conf/service_conf.yaml:

es:
  hosts: 'http://172.17.0.1:1200'

After making changes, restart your RAGFlow service/container so it reloads the new settings. This should resolve the connection issue. For more details, see the relevant config examples and discussion here and here.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 25 '25 09:06 dosubot[bot]

@dosu python api/ragflow_server.py; exec this command. i meet the following error: 2025-06-25 18:00:49,177 INFO 2443 ragflow_server log path: /workspace/ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-06-25 18:00:49,551 INFO 2443 can't import package 'torch' 2025-06-25 18:00:51,739 INFO 2443 init database on cluster mode successfully 2025-06-25 18:00:53,524 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/det.onnx uses CPU 2025-06-25 18:00:53,629 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/rec.onnx uses CPU Traceback (most recent call last): File "/workspace/ragflow/api/ragflow_server.py", line 36, in from api.apps import app File "/workspace/ragflow/api/apps/init.py", line 137, in client_urls_prefix = [ File "/workspace/ragflow/api/apps/init.py", line 138, in register_page(path) for dir in pages_dir for path in search_pages_path(dir) File "/workspace/ragflow/api/apps/init.py", line 120, in register_page spec.loader.exec_module(page) File "/workspace/ragflow/api/apps/canvas_app.py", line 20, in from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService File "/workspace/ragflow/api/db/services/canvas_service.py", line 20, in from agent.canvas import Canvas File "/workspace/ragflow/agent/canvas.py", line 22, in from agent.component import component_class File "/workspace/ragflow/agent/component/init.py", line 41, in from .exesql import ExeSQL, ExeSQLParam File "/workspace/ragflow/agent/component/exesql.py", line 24, in import pyodbc ImportError: libodbc.so.2: cannot open shared object file: No such file or directory

Jun 25 '25 10:06 Jwpap

The error ImportError: libodbc.so.2: cannot open shared object file means your system is missing the ODBC libraries required by pyodbc, which RAGFlow imports by default—even if you don't use MSSQL features. To fix this on Ubuntu 20.04, run:

sudo apt-get update
sudo apt-get install -y unixodbc unixodbc-dev

This will install the required libraries and allow pyodbc to import successfully, letting RAGFlow start up as expected reference.

For the "can't import package 'torch'" message: if you need torch features, install it with

uv sync --python 3.10 --all-extras

or

pip install .[full]

If you're using the slim version of RAGFlow and don't need torch, you can ignore this message—it won't affect core functionality reference.

Let me know if you hit any other errors after installing the ODBC libraries.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 25 '25 10:06 dosubot[bot]

@dosu why port is 9222? how can i change it in local services?

Jun 25 '25 11:06 Jwpap

@dosu

@dosu why port is 9222? how can i change it in local services?

Jun 25 '25 13:06 Jwpap

Port 9222 is the default for the frontend development server in RAGFlow—it’s set in the web/.env file with the line PORT=9222. To change it, just edit that file and set PORT to your desired value (for example, PORT=3000). Restart the frontend dev server after making the change for it to take effect. This is a standard setup for projects using Vite or similar frontend tooling (source).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jun 25 '25 13:06 dosubot[bot]

@dosu I upload a markdow file, but the content after upload is "{"code":100,"data":null,"message":"TypeError("The view function for 'document.get' did not return a valid response. The function either returned None or ended without a return statement.")"}", which is different from the origin content. Also, it will lead to parse fail,

.

Jul 04 '25 06:07 Jwpap

This error usually means the markdown file wasn't actually saved to storage or mapped in the database during upload, so when RAGFlow tries to retrieve or parse it, it can't find the file and throws an internal error. The TypeError happens because the backend view function hits an unhandled exception (like missing file) and doesn't return a valid response, so you get that JSON error instead of the file content.

Most often, this is caused by one of these: storage backend misconfiguration (e.g., MINIO/S3 not accessible), permission issues, or a bug in the upload logic that fails to persist the file or its mapping. Recent changes also enforce stricter filename length checks and error handling, so if your filename is too long or the file type isn't recognized, the upload will silently fail and parsing will break (source).

To debug:

Check your storage backend config and make sure RAGFlow can write/read files (look for errors in the backend logs right after upload).
Confirm the file appears in your storage (MINIO/S3/local) and in the database after upload.
Try uploading a simple markdown file with a short filename to rule out filename length issues.
If you see errors in the logs during upload, share them—they'll point to the root cause.

If the file isn't saved or mapped, parsing will always fail with "No such file or directory" (source). Enable debug logging if needed to capture more details during upload and parse attempts.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jul 04 '25 06:07 dosubot[bot]

@dosu the logs is in the following: minio.error.S3Error: S3 operation failed; code: XMinioStorageFull, message: Storage backend has reached its minimum free drive threshold. Please delete a few objects to proceed., resource: /c5253a86532811f09b381d8ba05b7df9/README(1).md, request_id: 184EFA675C288B4B, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: c5253a86532811f09b381d8ba05b7df9, object_name: README(1).md 2025-07-04 15:03:47,934 ERROR 298 Fail to put c5253a86532811f09b381d8ba05b7df9/README(1).md: where can i delete

Jul 04 '25 07:07 Jwpap

You should delete files using the RAGFlow web UI file manager—just select files or folders you no longer need and use the delete (trash) button. This will safely remove them from both MinIO storage and the RAGFlow database, freeing up space and preventing orphaned records. If you can't access the UI, you can call the /rm API endpoint with the file IDs to delete via a POST request; this triggers the same backend cleanup logic.

Avoid deleting files directly from MinIO (using the MinIO client, web console, or by removing files from the data directory), as this can leave broken references in RAGFlow and cause further errors. Only use direct MinIO deletion if the UI/API is unavailable, and be aware you'll need to manually clean up the database afterward to avoid inconsistencies.

If you need to prioritize, look for large or old files to delete first. After deleting, try uploading again or check MinIO usage stats to confirm space has been freed.

More details on the storage structure and deletion process are available here and here.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jul 04 '25 07:07 dosubot[bot]

@dosu. How can I check the remaining space size and maximum space of Minio, and where can I modify it. I only uploaded one file and it's already full.

Jul 04 '25 07:07 Jwpap

MinIO storage space is determined by the size of the Docker volume or host disk where its data is stored—not by any RAGFlow or MinIO config setting (unless you use MinIO Enterprise quotas, which aren't referenced here) source.

To check remaining and total space:

Open the MinIO web console (usually at http://localhost:9001 or the port you mapped in Docker Compose). It shows some usage stats, but for detailed free/used space, check on the Docker host.
On the host, run docker volume inspect minio_data to find the data path, then use df -h /var/lib/docker/volumes/minio_data/_data or du -sh /var/lib/docker/volumes/minio_data/_data to see actual disk usage and free space source.

To increase available space, expand the underlying disk/partition, or move the MinIO data directory to a larger disk and update the Docker Compose volume mapping. There are no RAGFlow-side settings for MinIO quotas or space reporting—it's all managed at the Docker/host level.

If MinIO is full after just one file, it's likely the Docker volume or host disk is very small or nearly full, or the volume mapping is misconfigured. Free up space by deleting files in the RAGFlow UI, or increase the disk/volume size as above.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Jul 04 '25 07:07 dosubot[bot]

@dosu python api/ragflow_server.py; exec this command. i meet the following error:运行命令 python api/ragflow_server.py; exec this command.，我遇到了以下错误： 2025-06-25 18:00:49,177 INFO 2443 ragflow_server log path: /workspace/ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'}2025-06-25 18:00:49,177 INFO 2443 ragflow_server 日志路径: /workspace/ragflow/logs/ragflow_server.log，日志级别: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-06-25 18:00:49,551 INFO 2443 can't import package 'torch'2025-06-25 18:00:49,551 信息 2443 无法导入包 'torch' 2025-06-25 18:00:51,739 INFO 2443 init database on cluster mode successfully 2025-06-25 18:00:51,739 信息 2443 在集群模式下成功初始化数据库 2025-06-25 18:00:53,524 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/det.onnx uses CPU 2025-06-25 18:00:53,524 INFO 2443 加载模型 /workspace/ragflow/rag/res/deepdoc/det.onnx 使用 CPU 2025-06-25 18:00:53,629 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/rec.onnx uses CPU2025-06-25 18:00:53,629 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/rec.onnx 使用 CPU Traceback (most recent call last): Traceback (最近调用堆栈): File "/workspace/ragflow/api/ragflow_server.py", line 36, in 文件 "/workspace/ragflow/api/ragflow_server.py"，第 36 行，在 from api.apps import app File "/workspace/ragflow/api/apps/init.py", line 137, in 文件 "/workspace/ragflow/api/apps/init.py", 第 137 行，在 client_urls_prefix = [ File "/workspace/ragflow/api/apps/init.py", line 138, in 文件 "/workspace/ragflow/api/apps/init.py", 第 138 行，在 register_page(path) for dir in pages_dir for path in search_pages_path(dir) File "/workspace/ragflow/api/apps/init.py", line 120, in register_page文件 "/workspace/ragflow/api/apps/init.py"，第 120 行，在 register_page 中 spec.loader.exec_module(page) File "/workspace/ragflow/api/apps/canvas_app.py", line 20, in 文件 "/workspace/ragflow/api/apps/canvas_app.py"，第 20 行，在 from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService从 api.db.services.canvas_service 导入 CanvasTemplateService, UserCanvasService File "/workspace/ragflow/api/db/services/canvas_service.py", line 20, in 文件 "/workspace/ragflow/api/db/services/canvas_service.py"，第 20 行，在 from agent.canvas import Canvas File "/workspace/ragflow/agent/canvas.py", line 22, in from agent.component import component_class File "/workspace/ragflow/agent/component/init.py", line 41, in from .exesql import ExeSQL, ExeSQLParam File "/workspace/ragflow/agent/component/exesql.py", line 24, in import pyodbc 导入 pyodbc ImportError: libodbc.so.2: cannot open shared object file: No such file or directory

请问您这个问题怎么解决的呀，我遇到了和你一样的问题

Jul 07 '25 01:07 nilin1998

@dosu python api/ragflow_server.py; exec this command. i meet the following error:运行命令 python api/ragflow_server.py; exec this command.，我遇到了以下错误： 2025-06-25 18:00:49,177 INFO 2443 ragflow_server log path: /workspace/ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'}2025-06-25 18:00:49,177 INFO 2443 ragflow_server 日志路径: /workspace/ragflow/logs/ragflow_server.log，日志级别: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-06-25 18:00:49,551 INFO 2443 can't import package 'torch'2025-06-25 18:00:49,551 信息 2443 无法导入包 'torch' 2025-06-25 18:00:51,739 INFO 2443 init database on cluster mode successfully 2025-06-25 18:00:51,739 信息 2443 在集群模式下成功初始化数据库 2025-06-25 18:00:53,524 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/det.onnx uses CPU 2025-06-25 18:00:53,524 INFO 2443 加载模型 /workspace/ragflow/rag/res/deepdoc/det.onnx 使用 CPU 2025-06-25 18:00:53,629 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/rec.onnx uses CPU2025-06-25 18:00:53,629 INFO 2443 load_model /workspace/ragflow/rag/res/deepdoc/rec.onnx 使用 CPU Traceback (most recent call last): Traceback (最近调用堆栈): File "/workspace/ragflow/api/ragflow_server.py", line 36, in 文件 "/workspace/ragflow/api/ragflow_server.py"，第 36 行，在 from api.apps import app File "/workspace/ragflow/api/apps/init.py", line 137, in 文件 "/workspace/ragflow/api/apps/init.py", 第 137 行，在 client_urls_prefix = [ File "/workspace/ragflow/api/apps/init.py", line 138, in 文件 "/workspace/ragflow/api/apps/init.py", 第 138 行，在 register_page(path) for dir in pages_dir for path in search_pages_path(dir) File "/workspace/ragflow/api/apps/init.py", line 120, in register_page文件 "/workspace/ragflow/api/apps/init.py"，第 120 行，在 register_page 中 spec.loader.exec_module(page) File "/workspace/ragflow/api/apps/canvas_app.py", line 20, in 文件 "/workspace/ragflow/api/apps/canvas_app.py"，第 20 行，在 from api.db.services.canvas_service import CanvasTemplateService, UserCanvasService从 api.db.services.canvas_service 导入 CanvasTemplateService, UserCanvasService File "/workspace/ragflow/api/db/services/canvas_service.py", line 20, in 文件 "/workspace/ragflow/api/db/services/canvas_service.py"，第 20 行，在 from agent.canvas import Canvas File "/workspace/ragflow/agent/canvas.py", line 22, in from agent.component import component_class File "/workspace/ragflow/agent/component/init.py", line 41, in from .exesql import ExeSQL, ExeSQLParam File "/workspace/ragflow/agent/component/exesql.py", line 24, in import pyodbc 导入 pyodbc ImportError: libodbc.so.2: cannot open shared object file: No such file or directory@dosu-python-api/ragflow_server.py；执行此命令。我遇到了以下错误：运行命令 python-api/ragflow_server.py；执行此命令。，我遇到了以下错误：2025-06-25 18:00:49177信息2443 ragflow_server日志路径：/workspace/ragflow/logs/ragflow_server.log，日志级别：{“weewee”：“WARNING”，“pdfminer”：“WARNING”，“root”：“INFO”}2025-06-25 18:00:49177INFO 2443 ragflow_server日志路径:/workspace/ragflow/logs/ragflow_server.log日志级别:｛'peewee'：'警告'，'pdfminer'：'警报'，'root'：'信息'｝2025-06-25 18:00:49551信息2443无法导入包'torch'2025-06信息 2443 无法导入包 'torch’2025-06-25 18:00:51739信息2443集群模式初始化数据库成功2025-07-25 18:0:51739信息 2443 在集群模式下成功初始化数据库 2025-06-25 18:00:53524信息2443 load_model/workspace/ragflow/rag/res/deepdoc/det.onx使用CPU 2025-06-25 18:00:53524消息2443加载模型 /工作空间/ragflow/rag/res/deepdoc/det.onx使用 CPU 2025-06-25 18:00:53629信息2443 load_model/workspace/ragflow/rag/res/deepdoc/rec.onx使用CPU2025-06-25 18:00-53629信息2473 load_mode/workspace/ragflow/rac/res/deepdoc/rec.onnx使用 CPU Traceback（最后一次调用）：Traceback（最近一次调用基础）：文件“/workspace/ragflow/api/ragflow_server.py”，第36行，in文件 “/workspace/ragflow/api/ragflow_server.py”第 36 行，在从api.apps导入应用程序文件“/workspace/ragflow/api/apps/init.py”，第137行，in文件 “/workspace/ragflow/api/apps/init.py”，第 137 行，在 client_urls_prefix=[文件“/workspace/ragflow/api/apps/init.py”，第138行，in文件 “/workspace/ragflow/api/apps/init.py”，第 138 行，在 register_page（path）表示pages_dir中的目录，dir表示search_pages_path（dir）文件“/workspace/ragflow/api/apps/init.py”中的路径，第120行，位于register-page中文件 “/workspace/ragflow/api/apps/init.py”第 120 行，在注册_页面中 spec.loader.exec_module（页面）文件“/workspace/ragflow/api/apps/canvas_app.py”，第20行，in文件 “/workspace/ragflow/api/apps/canvas_app.py”第 20 行，在从api.db.services.canvas_service导入CanvasTemplateService，用户CanvasService从 api.db.services.canvas_service导入 CanvasTemplateService，用户CanvasService文件“/workspace/ragflow/api/db/services/canvas_service.py”，第20行，在文件 “/workspace/ragflow/api/db/services/canvas_service.py”第 20 行，在从agent.canvas导入画布文件“/workspace/ragflow/agent/cavan.py”，第22行，从agent.comomponent导入component_class文件“/wworkspace/ragflow/agent/component/init.py”中，第41行，从.exql导入exesql，ExeSQLParam文件“/workworks/ragflow/agent/component/exsql.py”中，在导入pyodbc中，第24行导入 pyodbc ImportError:libodbc.so.2：无法打开共享对象文件：没有这样的文件或目录

请问您这个问题怎么解决的呀，我遇到了和你一样的问题

参考下面dosu或者问下gpt, 类似的安装了一些东西解决的

Jul 07 '25 03:07 Jwpap