PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

Bullet points separated

Open christopher5106 opened this issue 4 years ago • 0 comments

When running the following code:

var words = page.GetWords(NearestNeighbourWordExtractor.Instance).ToList();
var blocks = DocstrumBoundingBoxes.Instance.GetBlocks(words);
var orderedBlocks = new UnsupervisedReadingOrderDetector(spatialReasoningRule: UnsupervisedReadingOrderDetector.SpatialReasoningRules.RowWise, useRenderingOrder: false).Get(blocks).ToList();
var finalBlocks = new List<UglyToad.PdfPig.DocumentLayoutAnalysis.TextBlock>(orderedBlocks);

on the file https://drive.google.com/file/d/1fWXuDZpO_iUgwNhqCds60WhaQYFXJkOK/view?usp=sharing

I get the following text:

This service is free and comes with the following additional benefits:

● ●

Payments are received faster than waiting for a cheque to arrive in the mail. No more lost/misplaced/fraudulent cheques.

I happens many times that the bullet points are the numbered points are separated from text.

christopher5106 avatar Feb 22 '21 12:02 christopher5106