tabula-sharp
tabula-sharp copied to clipboard
[BUG] - Stream: Area detection hangs on PDF page
Describe the bug When attempting to extract tables from this 250+ page PDF, I found that it hangs on a specific page (98), in the 'Detect' method.
To Reproduce Using 40927R03.pdf
I've tried with 0.1.3 and 0.1.4-alpha001, and got hang in same spot.
Using .NET 6.0, C#.
using var pdoc = PdfDocument.Open(content.Stream, new ParsingOptions { SkipMissingFonts = true, UseLenientParsing = true });
var da = new Tabula.Detectors.SimpleNurminenDetectionAlgorithm();
var area = Tabula.ObjectExtractor.ExtractPage(pdoc, 98 /* hangs on this page */);
var regions = da.Detect(area); <-- this line hangs
Expected behavior To properly parse all tables.