camelot icon indicating copy to clipboard operation
camelot copied to clipboard

Possible Performance Improvement (for Lattice)

Open Asrst opened this issue 5 years ago • 3 comments
trafficstars

To extract tables using Lattice flavour, currently ghostscript is being used in generate image (_generate_image method). Image generation can also be done using PyMuPDF , a python wrapper for MuPDF (same license as Ghostscript).

  • It helps to remove ext-dependency & make camelot much easy to install. (From my experience it is easy to install PyMuPDF)
  • MuPDF also faster than Ghostscript. (around 20-30% less time taken to generate 300 dpi images)

refer this for performance comparison of GS vs MuPDF: link

Currently, I see Ghostscript is being used for only image conversion, but is there any other specific reason for using Ghostscript ?

I wish to contribute and can raise a PR for this

Asrst avatar Aug 29 '20 10:08 Asrst

@Asrst Sorry for the late reply. I can work on this with you if you're interested.

Currently, I see Ghostscript is being used for only image conversion, but is there any other specific reason for using Ghostscript ?

Yes that's the only reason ghostscript is being currently used. I've also been looking at PyMuPDF since they have pre-built wheels for all platforms. I'm planning to add support for it as a pdf-to-png conversion backend and set it as default.

vinayak-mehta avatar Oct 01 '20 20:10 vinayak-mehta

Hi, @vinayak-mehta

In process of working on the above change (adding support for pymupdf). I cloned the repo & ran the tests (using pytests) without making any changes.

1 test failed in the first run with following Error. but when I reran all of them were passed (without any changes).

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 
 C:\\Users\\Satya\\AppData\\Local\\Temp\\tmps6dgqy32\\p-1_rotated.pdf

..\..\..\AppData\Local\Programs\Python\Python37\lib\shutil.py:398: PermissionError

I decided to experiment a bit & ran the tests multiple times. In few runs, one or other tests are failing, again not same test every time with same Permission Error as above. The error is not consistently reproducible.

coube be indicating some memory leak (Not properly closing some open file), There is also a issue here #173 & #174 .

Asrst avatar Oct 08 '20 16:10 Asrst

I decided to experiment a bit & ran the tests multiple times. In few runs, one or other tests are failing, again not same test every time with same Permission Error as above. The error is not consistently reproducible.

I haven't faced this error on Linux, I'll try to run the tests on Windows (either on CI or on another laptop).

coube be indicating some memory leak (Not properly closing some open file), There is also a issue here #173 & #174 .

Yes that's possible, I'll try to look at the code and guess where this might be happening. If you fix this while working on your pymudpf PR, please raise another PR with the fix :)

vinayak-mehta avatar Oct 12 '20 16:10 vinayak-mehta