PdfPig
PdfPig copied to clipboard
Results of the textblock extracted from a PDF vary depending on the operating system.
I built a project that includes the following source code in the Windows .NET 7.0 environment.
using (PdfDocument doc = PdfDocument.Open(bytes))
{
IEnumerable<Page> pages = doc.GetPages();
for (int pageNo = StartIndex > 1 ? StartIndex : 1; pageNo <= doc.NumberOfPages; pageNo++)
{
Page page = doc.GetPage(pageNo);
IEnumerable<Word> words = page.GetWords();
RecursiveXYCut.RecursiveXYCutOptions recursiveXYOpt = new RecursiveXYCut.RecursiveXYCutOptions();
RecursiveXYCut recursiveXYCut = new RecursiveXYCut(recursiveXYOpt);
IReadOnlyList<TextBlock> textBlocks = recursiveXYCut.GetBlocks(words);
foreach (TextBlock textBlock in textBlocks)
{
TextBlock2Json(textBlock);
}
}
}
Also, I built it for the linux-x64 environment using the command:
Command
> dotnet publish -r linux-x64
Here is the version information.
PdfPig version : 0.1.8
Windows .NET SDK version : 7.0
Linux .NET SDK version : 7.0
I tested with the same PDF file as input on both OS and checked the results.
Windows Results
...
{
"PAGE": 1,
"SENTENCE": "Page 1 of 26",
"WIDTH": 5.144824218749989,
"HEIGHT": 6.753515625000006
}
...
Linux Results
...
{
"PAGE": 1,
"SENTENCE": "Page 1 of 26",
"WIDTH": 26.193600000000004,
"HEIGHT": 9.271799999999999
}
...
The WIDTH is textBlock.TextLines.First().Words.First().Letters.First().GlyphRectangle.Width
and the HEIGHT is textBlock.TextLines.First().Words.First().Letters.First().GlyphRectangle.Height.
Regardless of which PDF file is input, it shows different results. Why do Windows and Linux show different results?
@ggaebee can you test with the latest pre-release package and check the issue is still there?
#686 might have changed the behaviour
@BobLd After testing with the latest pre-release package, the issue of extreme value discrepancies has been resolved, but there are still differences in the values. Could you check on this issue?
Windows Results
{
"PAGE": 1,
"SENTENCE": "Page 1 of 26",
"WIDTH": 5.144824218749989,
"HEIGHT": 6.753515625000006
}
Linux Results
{
"PAGE": 1,
"SENTENCE": "Page 1 of 26",
"WIDTH": 4.970507812499989,
"HEIGHT": 6.678808593750006
}