[Bug]: Error in processing document DocProcessingStatus.__init__()
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
I'm using RAG-Anything and facing this error: Failed to process: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed'
I think DocProcessingStatus class did not define this field
Steps to reproduce
No response
Expected Behavior
No response
LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information
- LightRAG Version: 1.4.5
- Operating System: WSL Ubuntu24.04
- Python Version: 3.12
this looks like a classic case of what we call bootstrap ordering failure — where certain downstream logic (like document processing or status assignment) is triggered before the schema or index layer is fully registered.
we’ve seen this surface in many RAG-style systems, especially when multi-layer modules (like retrievers, processors, and status handlers) don’t share a stable init contract or fire out-of-order.
it maps directly to what we call ProblemMap No.14 – "Bootstrap Ordering".
we’ve been building diagnostic routines + guard layers around this to prevent silent failures during cold start. it’s MIT-licensed and backed by tesseract.js author.
happy to share the setup if you’re interested — just let me know.
Thanks for the detailed explanation — that makes a lot more sense now. I'd love to try the diagnostic + guard setup you've mentioned. Please do share the setup or point me to the repo/docs if possible.
ah great
this exact failure is mapped in our diagnostic system under “Bootstrap Ordering”: https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
basically: if downstream logic (like DocStatus init) gets triggered before the OCR/chunk layers are fully deployed or warmed up, stuff breaks in weird ways — sometimes even silently.
we ran into this in multi-agent, async pipelines. fixed it by forcing certain modules to delay booting until upstream output was structurally valid. not just about timing — about dependency flow.
let me know which part you're testing and I can point you to the right patch structure. whole thing is MIT licensed if you wanna copy anything.
I finally stopped getting this when I deleted the cache that RAGAnything was using!
I finally stopped getting this when I deleted the cache that RAGAnything was using!
How do you make it?
I finally stopped getting this when I deleted the cache that RAGAnything was using!
How do you make it?
remove tmp dir output and rag_storage
I finally stopped getting this when I deleted the cache that RAGAnything was using!
You need rag_storage to initialize an existing RAG instance and output to perform a query. But the question is: How is a query processed once the entire cache has been cleared?
any update?
Getting the same error in raganything version 1.2.8 and python 3.10. Any update on how to fix this error? I tried clearing the output folder and rerunning the jupyter cell. Now even the images are not re-extracted and started getting this error again.
Adding multimodal_processed: bool | None = False into DocProcessingStatus class inside base.py within lightrag package seems to solve the problem for me.