pdf2image icon indicating copy to clipboard operation
pdf2image copied to clipboard

Corrupt JPEG data: Premature end of data segment

Open mrxiaohe opened this issue 6 years ago • 14 comments

I am trying to convert a set of PDF documents into jpegs. While converting, I occasionally get a pop up error message as shown in the screenshot below. I wonder what may have caused this and if there is a way to by pass it (since the conversion process stalls when the error dialog window is shown unless I close it).

image

mrxiaohe avatar Feb 23 '19 14:02 mrxiaohe

Do you have a PDF I could use to reproduce the error on my side?

Belval avatar Feb 23 '19 14:02 Belval

Closing for inactivity.

Belval avatar Mar 27 '19 23:03 Belval

I have document for the same. However I cannot share the document. If you can suggest then I can send you logs. Basically this error comes when pdftoppm is spawned and it waits for process to complete. However it waits infinitely till OK button is pressed on error popup. I am attaching Before and After pdf POPUP Error ImagePopUpError

After POPUP Error Ok pressed ImagePopUpError-After

bikashgupta11 avatar Mar 29 '19 10:03 bikashgupta11

Hmm ok. I'll bring this up with the poppler team, maybe there is an envvar that can be set to prevent these.

Belval avatar Mar 29 '19 10:03 Belval

thanks. just for information, I am using poppler for windows

bikashgupta11 avatar Mar 29 '19 11:03 bikashgupta11

Good news: Found the cause Bad news: It's not overridable without recompiling libjpeg

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/jerror.c#L104

I think I might just compile poppler on my side and distribute that instead of the alivate one.

Belval avatar Mar 31 '19 15:03 Belval

In my current work, I've been using the pdf2image just to get the image object of the first page of the pdf. This is how I've been using it -> convert_from_path(filePathName, dpi=400, first_page=1, last_page=1, fmt='jpg')[0] Now, the issue in my case is that it generates a similar JPEG Error but the message is different ErrorCapture

I've checked the pdf file and there doesn't seem to be any issue with the file. So, while this issue of windows dialog boxes for Error is still under work. Is there any way for us to just suppress the dialog box from our python script and continue with remaining tasks.

bhutraharish avatar Aug 07 '19 03:08 bhutraharish

If there is, I haven't found it. It's packed in the library and there is no env var to deactivate it.

On the other hand, I will have more time to spare on this project shortly so I will probably be able to rebuild said library.

Belval avatar Aug 07 '19 12:08 Belval

Not an issue. For the time being, I Just wrote a pywinauto script which runs concurrently and closes any such JPEG error window it finds.

bhutraharish avatar Aug 07 '19 12:08 bhutraharish

is there any alternate option to avoid this?

nannigath avatar Aug 05 '21 11:08 nannigath

Are you still experiencing this issue? It should have been fixed by https://github.com/Belval/pdf2image/pull/195

Belval avatar Aug 13 '21 03:08 Belval

Are you still experiencing this issue? It should have been fixed by #195

I still get the "Corrupt JPEG data: Premature end of data segment" occasionally. I am using pdf2image Version: 1.16.0.

Has this PR been incorporated in the latest distributed version on PyPi?

nasheedyasin avatar Apr 27 '22 08:04 nasheedyasin

Yes it should be in the latest version available on PyPI. If you are still experiencing this issue then it was not fixed.

Unfortunately I cannot reproduce the issue so there is not much that I can do.

Belval avatar May 06 '22 22:05 Belval

Actually updating poppler (basically using the version you've suggested in the READ.ME) worked. The issue is gone.

nasheedyasin avatar May 07 '22 02:05 nasheedyasin