PdfPig
PdfPig copied to clipboard
InvalidOperationException calling GetPage() "Failed to parse the content for the page: 32"
Reproduction:
using var doc = UglyToad.PdfPig.PdfDocument.Open(path);
doc.GetPages().ToArray();
File: FailedToParseContentForPage32.pdf
Error:
System.InvalidOperationException: Failed to parse the content for the page: 32
at UglyToad.PdfPig.Content.BasePageFactory`1.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\BasePageFactory.cs:line 155
at UglyToad.PdfPig.Content.Pages.GetPage[TPage](IPageFactory`1 pageFactory, Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\Pages.cs:line 100
at UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\Pages.cs:line 43
at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\PdfDocument.cs:line 163
at UglyToad.PdfPig.PdfDocument.GetPages()+MoveNext() in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\PdfDocument.cs:line 213
at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)
This file opens fine in Chrome; page 32 displays as blank. The file is likely corrupt in some way, but it would be nice if PdfPig did not choke on it
@mikethea1 Page 32: this is going to be tricky to change the behaviour here, because it would mean returning a null TPage object from the IPageFactory<TPage> (BasePageFactory<TPage>) Create() method. This would be a major breaking change.
Page 33 of your document also has a bug (different). I'm going to push a fix for this one.
I'll leave the issue open for the moment, because we might end up changing the behaviour for page 32
I'd probably just approach this by iterating the pages and try/catch-ing each page access. Closing in line with #1095
Looks like one of the compressed images is corrupted. When applying PR #1186 the error becomes