tabula-sharp icon indicating copy to clipboard operation
tabula-sharp copied to clipboard

[BUG] - Stream: Area detection hangs on PDF page

Open kirk-marple opened this issue 1 year ago • 7 comments

Describe the bug When attempting to extract tables from this 250+ page PDF, I found that it hangs on a specific page (98), in the 'Detect' method.

To Reproduce Using 40927R03.pdf

I've tried with 0.1.3 and 0.1.4-alpha001, and got hang in same spot.

Using .NET 6.0, C#.

using var pdoc = PdfDocument.Open(content.Stream, new ParsingOptions { SkipMissingFonts = true, UseLenientParsing = true });
var da = new Tabula.Detectors.SimpleNurminenDetectionAlgorithm();

var area = Tabula.ObjectExtractor.ExtractPage(pdoc, 98 /* hangs on this page */);
var regions = da.Detect(area); <-- this line hangs

Expected behavior To properly parse all tables.

kirk-marple avatar Jan 05 '24 01:01 kirk-marple