PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

InvalidOperationException calling GetPage() "Failed to parse the content for the page: 32"

Open mikethea1 opened this issue 11 months ago • 1 comments

Reproduction:

using var doc = UglyToad.PdfPig.PdfDocument.Open(path);
doc.GetPages().ToArray();

File: FailedToParseContentForPage32.pdf

Error:

System.InvalidOperationException: Failed to parse the content for the page: 32
   at UglyToad.PdfPig.Content.BasePageFactory`1.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\BasePageFactory.cs:line 155
   at UglyToad.PdfPig.Content.Pages.GetPage[TPage](IPageFactory`1 pageFactory, Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\Pages.cs:line 100
   at UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, NamedDestinations namedDestinations, ParsingOptions parsingOptions) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\Content\Pages.cs:line 43
   at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber) in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\PdfDocument.cs:line 163
   at UglyToad.PdfPig.PdfDocument.GetPages()+MoveNext() in C:\git\csharp\PdfPig\src\UglyToad.PdfPig\PdfDocument.cs:line 213
   at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
   at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)

This file opens fine in Chrome; page 32 displays as blank. The file is likely corrupt in some way, but it would be nice if PdfPig did not choke on it

mikethea1 avatar Dec 10 '24 13:12 mikethea1

@mikethea1 Page 32: this is going to be tricky to change the behaviour here, because it would mean returning a null TPage object from the IPageFactory<TPage> (BasePageFactory<TPage>) Create() method. This would be a major breaking change.

Page 33 of your document also has a bug (different). I'm going to push a fix for this one.

I'll leave the issue open for the moment, because we might end up changing the behaviour for page 32

BobLd avatar Mar 02 '25 11:03 BobLd

I'd probably just approach this by iterating the pages and try/catch-ing each page access. Closing in line with #1095

EliotJones avatar Jul 20 '25 01:07 EliotJones

Looks like one of the compressed images is corrupted. When applying PR #1186 the error becomes

Image

rhuijben avatar Oct 14 '25 13:10 rhuijben