pathway icon indicating copy to clipboard operation
pathway copied to clipboard

[Bug]: Certain PDF crashes RAG pipeline

Open rjakomin opened this issue 10 months ago • 2 comments

Steps to reproduce

Hi, whenever I try to copy the attached PDF file to my data folder monitored by the private RAG pipeline, it crashes the engine (the whole app docker container) without any error message. It happens every time for this pdf document: https://www.dancilla.com/PDF/Dancilla_alle_Volkstaenze.pdf

Relevant log output

2025-02-25 14:59:40 pathway_engine.connectors.monitoring INFO FileSystem(data): 1 entries (3530 minibatch(es)) have been sent to the engine
2025-02-25 15:00:32 root INFO {"_type": "request_payload", "session_id": "uuid-29d4de7b-6cd3-4f92-ab6e-5111029c3157", "payload": {}}
2025-02-25 15:00:37 root INFO {"_type": "request_payload", "session_id": "uuid-65828918-7085-4c99-8385-7a9c93320895", "payload": {}}
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO FileSystem(data): 0 entries (1 minibatch(es)) have been sent to the engine
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO PythonReader: 2 entries (87119 minibatch(es)) have been sent to the engine
2025-02-25 15:01:02 pathway_engine.connectors.monitoring INFO PythonReader: 0 entries (5 minibatch(es)) have been sent to the engine

What did you expect to happen?

Processing of the newly copied PDF file and including its content into the vector database used by the private RAG.

Version

current

Docker Versions (if used)

27.4.0, build bde2b89 (running on Windows 11)

OS

Windows 11

rjakomin avatar Feb 25 '25 15:02 rjakomin

Thank you for the report @rjakomin. While we investigate this, could you share any relevant statistics (like docker stats memory, CPU usage profile directly before the crash) which could explain the cause of the crash?

dxtrous avatar Feb 25 '25 18:02 dxtrous

Hi @rjakomin , thanks for your report. I did multiple tests in various environments including a Docker container in Windows 11. No problem occurs with the PDF file you gave. The file is read correctly and there is no crash.

Could you provide additional details about your environment and Docker statistics @dxtrous mentioned?

XGendre avatar Feb 26 '25 15:02 XGendre

Hey @rjakomin,

I'm closing this issue, since we're unable to reproduce it. Please note that Pathway has had many version updates, and fixes have been merged. Please feel free to reopen the issue if the problem persists, and make sure to add docker stats, which may help to detect the root cause of the problem.

zxqfd555 avatar Oct 17 '25 11:10 zxqfd555