pathway icon indicating copy to clipboard operation
pathway copied to clipboard

[Bug]: pdfminer crash at start (Adaptive RAG)

Open rjakomin opened this issue 9 months ago • 6 comments

Steps to reproduce

Hello,

after 1 month of inactivity I cannot start my RAG any more, without changing anything. The container starts, but the questions sent to the cloud LLM through the pathway container do not consider my local data any more.

The error I notice in the adaptiverag container is: pathway_engine.engine.dataflow ERROR ImportError: cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser' (/usr/local/lib/python3.11/site-packages/pdfminer/pdfparser.py) in operator 12.

It might be the cause.

I have already tried to update the sources from your site but to no avail, I encounter the same error in the log.

Can you please help me? It is urgent, thanks in advance.

Relevant log output

pathway_engine.engine.dataflow ERROR ImportError: cannot import name 'PSSyntaxError' from 'pdfminer.pdfparser' (/usr/local/lib/python3.11/site-packages/pdfminer/pdfparser.py) in operator 12.

What did you expect to happen?

The questions sent to the adaptive RAG should consider the local data.

Version

latest

Docker Versions (if used)

27.4

OS

Linux

On which CPU architecture did you run Pathway?

None

rjakomin avatar Apr 03 '25 09:04 rjakomin

Hi, thank you for reporting the issue. We will look into this and get back to you.

bjornengdahl avatar Apr 03 '25 09:04 bjornengdahl

I encountered the same issue. It seems to be a problem with the pdfminer dependency in the latest pathway docker image. I was able to get it work by reverting my image to use the 0.20.1 tag.

bockisn avatar Apr 08 '25 15:04 bockisn

@bockisn thank you very much for sharing, it works with the mentioned tag. I changed the beginning of the Dockerfile to: FROM pathwaycom/pathway:0.20.1

rjakomin avatar Apr 09 '25 08:04 rjakomin

Hi @rjakomin @bockisn , this issue will be resolved in the next deployment, this was caused by the bump of the pdfminer.six dependency in the pdfplumber's latest release.

For now, your suggestion should work.

berkecanrizai avatar Apr 09 '25 15:04 berkecanrizai

ok, great, thanks for your notice

rjakomin avatar Apr 09 '25 15:04 rjakomin

Hey, @rjakomin, the new Pathway version has been released. The problem should be fixed in the pathwaycom/pathway:0.21.3 docker image.

pw-ppodhajski avatar Apr 16 '25 07:04 pw-ppodhajski

Hey @rjakomin, I am closing this issue, since the problem must have been resolved with the release 0.21.3. Please feel free to reopen this issue or to create a new one if you have any problems with Pathway.

zxqfd555 avatar Oct 13 '25 15:10 zxqfd555