LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

[Bug]: Error in processing document DocProcessingStatus.__init__()

Open vanlinhtruongdang opened this issue 5 months ago • 10 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this bug is not already filed.
  • [x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I'm using RAG-Anything and facing this error: Failed to process: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'

I think DocProcessingStatus class did not define this field

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version: 1.4.5
  • Operating System: WSL Ubuntu24.04
  • Python Version: 3.12

vanlinhtruongdang avatar Aug 04 '25 11:08 vanlinhtruongdang

this looks like a classic case of what we call bootstrap ordering failure — where certain downstream logic (like document processing or status assignment) is triggered before the schema or index layer is fully registered.

we’ve seen this surface in many RAG-style systems, especially when multi-layer modules (like retrievers, processors, and status handlers) don’t share a stable init contract or fire out-of-order.

it maps directly to what we call ProblemMap No.14 – "Bootstrap Ordering".
we’ve been building diagnostic routines + guard layers around this to prevent silent failures during cold start. it’s MIT-licensed and backed by tesseract.js author.

happy to share the setup if you’re interested — just let me know.

onestardao avatar Aug 04 '25 13:08 onestardao

Thanks for the detailed explanation — that makes a lot more sense now. I'd love to try the diagnostic + guard setup you've mentioned. Please do share the setup or point me to the repo/docs if possible.

vanlinhtruongdang avatar Aug 05 '25 07:08 vanlinhtruongdang

ah great

this exact failure is mapped in our diagnostic system under “Bootstrap Ordering”: https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

basically: if downstream logic (like DocStatus init) gets triggered before the OCR/chunk layers are fully deployed or warmed up, stuff breaks in weird ways — sometimes even silently.

we ran into this in multi-agent, async pipelines. fixed it by forcing certain modules to delay booting until upstream output was structurally valid. not just about timing — about dependency flow.

let me know which part you're testing and I can point you to the right patch structure. whole thing is MIT licensed if you wanna copy anything.

onestardao avatar Aug 05 '25 10:08 onestardao

I finally stopped getting this when I deleted the cache that RAGAnything was using!

go-redrock-nicolas avatar Aug 13 '25 19:08 go-redrock-nicolas

I finally stopped getting this when I deleted the cache that RAGAnything was using!

How do you make it?

Larrybxs avatar Aug 21 '25 03:08 Larrybxs

I finally stopped getting this when I deleted the cache that RAGAnything was using!

How do you make it?

remove tmp dir output and rag_storage

tcluzhe avatar Aug 21 '25 11:08 tcluzhe

I finally stopped getting this when I deleted the cache that RAGAnything was using!

You need rag_storage to initialize an existing RAG instance and output to perform a query. But the question is: How is a query processed once the entire cache has been cleared?

frngo001 avatar Aug 27 '25 13:08 frngo001

any update?

Jester6136 avatar Sep 10 '25 03:09 Jester6136

Getting the same error in raganything version 1.2.8 and python 3.10. Any update on how to fix this error? I tried clearing the output folder and rerunning the jupyter cell. Now even the images are not re-extracted and started getting this error again.

cricketplayer avatar Oct 07 '25 02:10 cricketplayer

Adding multimodal_processed: bool | None = False into DocProcessingStatus class inside base.py within lightrag package seems to solve the problem for me.

IDavron avatar Oct 16 '25 10:10 IDavron