VideoRAG icon indicating copy to clipboard operation
VideoRAG copied to clipboard

Stuck at "Entity Extraction" phase with no progress for hours

Open gihunsong opened this issue 11 months ago • 4 comments

Hi, thanks for your great work on VideoRAG!

I'm trying to run a small test using the longervideos setting with just two videos (both around 30 minutes long), but the process seems to get stuck after the following log message:

INFO:nano-graphrag:[Entity Extraction]...

There are no error messages after this, but the program doesn’t proceed even after several hours of waiting.

Here’s a relevant portion of the log for context:

INFO:nano-graphrag:Load KV video_path with 2 data INFO:nano-graphrag:Load KV video_segments with 2 data INFO:nano-graphrag:Load KV text_chunks with 0 data INFO:nano-graphrag:Load KV llm_response_cache with 0 data INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './videorag-workdir-ollama-long/vdb_entities.json'} 0 data INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './videorag-workdir-ollama-long/vdb_chunks.json'} 0 data INFO:nano-vectordb:Load (135, 1024) data INFO:nano-vectordb:Init {'embedding_dim': 1024, 'metric': 'cosine', 'storage_file': './videorag-workdir-ollama-long/vdb_video_segment_feature.json'} 135 data INFO:nano-graphrag:Find the video named _FLt-6AMbx8.webm in storage and skip it. INFO:nano-graphrag:Find the video named w8Wt3K1DgDw.mp4 in storage and skip it. INFO:nano-graphrag:[New Chunks] inserting 45 chunks INFO:nano-graphrag:Insert chunks for naive RAG INFO:nano-graphrag:Inserting 45 vectors to chunks INFO:nano-graphrag:[Entity Extraction]...

Just to add — I was able to successfully run the pipeline with a short 1-minute video, so the basic setup seems to be working correctly. The issue only appears when I use longervideos (even just 2 of them).

Is this expected behavior for long videos, or could something be going wrong in the entity extraction phase?

Any guidance would be greatly appreciated. Thanks again!

By the way, I'm working on RTX A6000 (48G), and using ollama_config

gihunsong avatar Mar 26 '25 00:03 gihunsong

Hi 👋!

Thanks for your interest! Actually, the entity extraction phase will not take much time when tested with OpenAI LLMs.

We just updated the code to fix the max_async bug in LLM calling. It’s possible that the issue you mentioned is due to the previous code sending too many requests to the Ollama server, which caused it to get stuck. Please try the new code :)

Best regards, Xubin

Re-bin avatar Mar 26 '25 16:03 Re-bin

Thank you for your fast response. I'll try running it again and let you know if I have any issues.

By the way, I commented on https://github.com/HKUDS/VideoRAG/commit/d82858c3b632611392b1071fa73f1db801337f1d#comments, could you check it?

gihunsong avatar Mar 27 '25 02:03 gihunsong

Still no response, stuck in entity extraction phase :( Is your ollma working well?

gihunsong avatar Mar 27 '25 05:03 gihunsong

Hey, I'm actually running into the same problem on the regular lightrag-server.

I'm running it at well with ollama and the following configuration on a Nvidia Quadro P4000. The first 7 out of 9 chunks run in minutes, but the 8th already takes a few hours, and the 9th chunk I canceled after 1,5 days of running the GPU continuously on 90-100%. I already have a couple of documents inserted, which could be inserted in ~10min, maybe this contributes to the long loading times. Would be great to have a solution thou.

Thank you!

LLM_BINDING=ollama LLM_MODEL=qwen2.5:3b LLM_BINDING_HOST=http://localhost:11434 MAX_TOKENS=8192

EMBEDDING_BINDING=ollama EMBEDDING_BINDING_HOST=http://localhost:11434 EMBEDDING_MODEL=nomic-embed-text EMBEDDING_DIM=768

In the following the lightrag-server output.

INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:9621 (Press CTRL+C to quit) INFO: 127.0.0.1:55529 - "GET /docs HTTP/1.1" 200 INFO: 127.0.0.1:55528 - "GET /graph/label/list HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:55529 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:55529 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:55536 - "POST /documents/scan HTTP/1.1" 200 INFO: Found 1 new files to index. INFO: Processing batch 1/1 with 1 files INFO: Inserting 1 records to doc_status INFO: Process 48288 doc status writting 5 records to doc_status INFO: Stored 1 new unique documents INFO: Successfully fetched and enqueued file: 10 1016@j ijhydene 2020 07 180 (2)_p.txt INFO: Processing 1 document(s) in 1 batches INFO: Start processing batch 1 of 1. INFO: Inserting 1 records to doc_status INFO: Process 48288 doc status writting 5 records to doc_status INFO: Inserting 9 to chunks INFO: Starting entity and relation extraction. INFO: Inserting 1 records to full_docs INFO: Inserting 9 records to text_chunks INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Chk 1/9: extracted 11 Ent + 3 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 2/9: extracted 17 Ent + 10 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 3/9: extracted 0 Ent + 0 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 4/9: extracted 4 Ent + 2 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 5/9: extracted 5 Ent + 2 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 6/9: extracted 32 Ent + 9 Rel (deduplicated) INFO: Inserting 1 records to llm_response_cache INFO: Chk 7/9: extracted 12 Ent + 8 Rel (deduplicated) INFO: 127.0.0.1:55871 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:55871 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:55871 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:55871 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:55871 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:58187 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:58187 - "GET /documents/pipeline_status HTTP/1.1" 401 INFO: 127.0.0.1:58187 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:58187 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:58187 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:58187 - "GET /openapi.json HTTP/1.1" 200 INFO: Inserting 1 records to llm_response_cache INFO: Inserting 1 records to llm_response_cache INFO: Chk 8/9: extracted 0 Ent + 0 Rel (deduplicated) INFO: 127.0.0.1:59605 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:59605 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:59605 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:59605 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:59605 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:61048 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:61048 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:61048 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:61048 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:61048 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:62387 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:62387 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:62387 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:62387 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:62387 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:63829 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:63829 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:63829 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:63838 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:63838 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:65196 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:65196 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:65196 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:65205 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:65196 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:50482 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:50482 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:50482 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:50490 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:50490 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:51870 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:51870 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:51870 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:51881 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:51881 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:53207 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:53207 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:53207 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:53218 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:53218 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:54572 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:54572 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:54572 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:54579 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:54579 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:56040 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:56040 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:56040 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:56049 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:56049 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:57318 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:57318 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:57318 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:57324 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:57324 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:58717 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:58717 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:58717 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:58727 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:58727 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:60071 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:60071 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:60071 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:60079 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:60079 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:61433 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:61433 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:61433 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:61439 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:61433 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:62798 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:62798 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:62798 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:62803 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:62803 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:64125 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:64125 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:64125 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:64133 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:64133 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:49281 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:49281 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:49281 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:49291 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:49291 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:51318 - "GET /documents/pipeline_status HTTP/1.1" 401 INFO: 127.0.0.1:51318 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:51318 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:51318 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:51382 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:51382 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:53090 - "GET /documents/pipeline_status HTTP/1.1" 401 INFO: 127.0.0.1:53090 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:53090 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:53090 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:53090 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:54859 - "GET /documents/pipeline_status HTTP/1.1" 401 INFO: 127.0.0.1:54860 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:54859 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:54859 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:54860 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:54860 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:56395 - "GET /documents/pipeline_status HTTP/1.1" 401 INFO: 127.0.0.1:56395 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:56395 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:56395 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:56412 - "GET /graphs?label=&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:56412 - "GET /openapi.json HTTP/1.1" 200 INFO: 127.0.0.1:58118 - "GET /documents HTTP/1.1" 401 INFO: 127.0.0.1:58118 - "GET /auth-status HTTP/1.1" 200 INFO: 127.0.0.1:58118 - "GET /docs HTTP/1.1" 200 INFO: Subgraph query successful | Node count: 930 | Edge count: 343 INFO: 127.0.0.1:58139 - "GET /graphs?label=*&max_depth=3&min_degree=0 HTTP/1.1" 200 INFO: 127.0.0.1:58139 - "GET /openapi.json HTTP/1.1" 200

jak13h avatar Mar 31 '25 16:03 jak13h