ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Unable to interpret file

Open polemp opened this issue 11 months ago • 5 comments

Describe your problem

I uploaded a very simple text file, but it stopped every time when it did not exceed 10%, and it would not be updated for dozens of minutes. Checking the log found an error. How to solve it? The server has enough space, 10 cores and 32GB of memory. And check the CPU status, the usage is very low.

log: 2025-02-13 12:02:34,419 INFO 20 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-02-13T12:02:34.417+08:00", "boot_at": "2025-02-13T11:42:32.798+08:00", "pending": 0, "lag": 0, "done": 0, "failed": 4, "current": null} 2025-02-13 12:02:34,450 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "POST /v1/document/run HTTP/1.1" 200 - 2025-02-13 12:02:34,471 INFO 20 handle_task begin for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""} 2025-02-13 12:02:34,481 INFO 19 172.18.0.3 - - [13/Feb/2025 12:02:34] "GET /v1/document/list?kb_id=b16deaf2e9b911ef8b450242ac120004&keywords=&page_size=10&page=1 HTTP/1.1" 200 - 2025-02-13 12:02:34,849 INFO 20 HTTP Request: POST http://kode.work:11434/api/embeddings "HTTP/1.1 200 OK" 2025-02-13 12:02:35,090 INFO 20 HEAD http://es01:9200/ragflow_44670308e9b911efa2950242ac120004 [status:200 duration:0.222s] 2025-02-13 12:02:35,109 INFO 20 From minio(0.018906587000174113) 11111.txt/11111.txt 2025-02-13 12:02:35,119 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: 0.1, progress_msg: 12:02:35 Page(1~100000001): Start to parse. 2025-02-13 12:02:35,130 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 Page(1~100000001): [ERROR]Internal server error while chunking: failed to acquire lock update_progress 2025-02-13 12:02:35,141 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 [ERROR][Exception]: failed to acquire lock update_progress 2025-02-13 12:02:35,143 ERROR 20 handle_task got exception for task {"id": "5adc36a2e9bf11ef97940242ac120003", "doc_id": "da05ad88e9b911ef9f7b0242ac120004", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "b16deaf2e9b911ef8b450242ac120004", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "11111.txt", "type": "doc", "location": "11111.txt", "size": 3735, "tenant_id": "44670308e9b911efa2950242ac120004", "language": "Chinese", "embd_id": "EntropyYue/jina-embeddings-v2-base-zh:160m@Ollama", "pagerank": 2, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "qwen2.5@Ollama", "update_time": 1739419354436, "task_type": ""} Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"], File "/ragflow/rag/app/naive.py", line 250, in chunk callback(0.1, "Start to parse.") File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress TaskService.update_progress(task_id, d) File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner return fn(*args, **kwargs) File "/ragflow/api/db/services/task_service.py", line 193, in update_progress with DB.lock("update_progress", -1): File "/ragflow/api/db/db_models.py", line 371, in enter self.lock() File "/ragflow/api/db/db_models.py", line 355, in lock raise Exception(f'failed to acquire lock {self.lock_name}') Exception: failed to acquire lock update_progress

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 626, in handle_task do_handle_task(task) File "/ragflow/rag/svr/task_executor.py", line 559, in do_handle_task chunks = build_chunks(task, progress_callback) File "/ragflow/rag/svr/task_executor.py", line 225, in build_chunks progress_callback(-1, "Internal server error while chunking: %s" % str(e).replace("'", "")) File "/ragflow/rag/svr/task_executor.py", line 134, in set_progress TaskService.update_progress(task_id, d) File "/ragflow/.venv/lib/python3.10/site-packages/peewee.py", line 3128, in inner return fn(*args, **kwargs) File "/ragflow/api/db/services/task_service.py", line 193, in update_progress with DB.lock("update_progress", -1): File "/ragflow/api/db/db_models.py", line 371, in enter self.lock() File "/ragflow/api/db/db_models.py", line 355, in lock raise Exception(f'failed to acquire lock {self.lock_name}') Exception: failed to acquire lock update_progress

polemp avatar Feb 13 '25 04:02 polemp

Major error:
2025-02-13 12:02:35,130 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 Page(1~100000001): [ERROR]Internal server error while chunking: failed to acquire lock update_progress 2025-02-13 12:02:35,141 INFO 20 set_progress(5adc36a2e9bf11ef97940242ac120003), progress: -1, progress_msg: 12:02:35 [ERROR][Exception]: failed to acquire lock update_progress

polemp avatar Feb 13 '25 04:02 polemp

Image

polemp avatar Feb 13 '25 04:02 polemp

Is it on MAC? What about changing Mysql to mariaDB in docker-compose-base.yaml?

KevinHuSh avatar Feb 13 '25 06:02 KevinHuSh

Is it on MAC? What about changing Mysql to mariaDB in docker-compose-base.yaml?

I used this image on x86 Linux and had the same problem. The file could not be parsed. I had changed mysql to mariaDB and it still happened.

juquxiang avatar Feb 13 '25 09:02 juquxiang

I'm also running on an x86 Linux virtual machine, and I'm unable to use MySQL. The MySQL Docker container fails to start. After switching to MariaDB, it started successfully. Additionally, the original MinIO image RELEASE.2023-12-10T10-51-33Z-cpuv2 failed to start, so I replaced it with RELEASE.2023-12-02T10-51-33Z-cpuv1. Ultimately, the entire system is now functional. However, I have no idea how to resolve the aforementioned errors.

polemp avatar Feb 14 '25 00:02 polemp

Image

Hi, I'm facing the same issue. How can I resolve it?

sammichenVV avatar Mar 25 '25 08:03 sammichenVV

#5945

KevinHuSh avatar Mar 26 '25 03:03 KevinHuSh

hi, has this problem been solved? I have also encountered this problem and can't find a solution

daisywill avatar Apr 16 '25 06:04 daisywill