PdfPig icon indicating copy to clipboard operation
PdfPig copied to clipboard

ArgumentOutOfRangeException when reading a document

Open sreejith-kulamgarath opened this issue 1 year ago • 7 comments

Here is my code.

        var stream = new FileStream(@"JPM_Metals_Weekly_Booste_2024-05-03_4691417 (1)-pages-2.pdf", FileMode.Open);
        using var document = PdfDocument.Open(stream, new ParsingOptions());

        var pages = document.GetPages();
        
        Console.WriteLine(pages.Count());

I have attached the file I have used. JPM_Metals_Weekly_Booste_2024-05-03_4691417 (1)-pages-2.pdf

The version I was using: 0.1.9-alpha-20240628-bac00

sreejith-kulamgarath avatar Jun 28 '24 14:06 sreejith-kulamgarath

@sreejith-kulamgarath what library are you using here? I don't think this is PdfPig code

BobLd avatar Jun 28 '24 17:06 BobLd

@sreejith-kulamgarath what library are you using here? I don't think this is PdfPig code

Sorry about the previous code, I have updated the code now.

sreejith-kulamgarath avatar Jun 28 '24 19:06 sreejith-kulamgarath

@sreejith-kulamgarath can you try the following:

PdfDocument.Open(stream, new ParsingOptions() { SkipMissingFonts = true });

BobLd avatar Jun 29 '24 07:06 BobLd

@sreejith-kulamgarath anticipating your next question, you will need to use the following to clean the letters (many duplicate letters in your document)

 var letters = DuplicateOverlappingTextProcessor.Get(page.Letters);

BobLd avatar Jun 29 '24 07:06 BobLd

@sreejith-kulamgarath can you try the following:

PdfDocument.Open(stream, new ParsingOptions() { SkipMissingFonts = true });

This worked. But can we get a better user friendly message than an incomprehensible exception?

sreejith-kulamgarath avatar Jun 29 '24 13:06 sreejith-kulamgarath

What's the full stack trace you're getting?

BobLd avatar Jun 29 '24 14:06 BobLd

Unhandled exception. System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')
   at System.Collections.Generic.List`1.get_Item(Int32 index)
   at UglyToad.PdfPig.PdfFonts.Parser.Parts.BaseFontRangeParser.Parse(NumericToken numberOfOperations, ITokenScanner scanner, CharacterMapBuilder builder)
   at UglyToad.PdfPig.PdfFonts.Parser.CMapParser.Parse(IInputBytes inputBytes)
   at UglyToad.PdfPig.PdfFonts.Cmap.CMapCache.Parse(IInputBytes bytes)
   at UglyToad.PdfPig.PdfFonts.Parser.Handlers.Type0FontHandler.Generate(DictionaryToken dictionary)
   at UglyToad.PdfPig.PdfFonts.FontFactory.Get(DictionaryToken dictionary)
   at UglyToad.PdfPig.Content.ResourceStore.LoadFontDictionary(DictionaryToken fontDictionary, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Content.ResourceStore.LoadResourceDictionary(DictionaryToken resourceDictionary, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Parser.PageFactory.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, NamedDestinations namedDestinations, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber)
   at UglyToad.PdfPig.PdfDocument.GetPages()+MoveNext()
   at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source)
   at Modules.PdfProcessorTest.Test() in ***Modules/PdfProcessorTest.cs:line 23
   at Program.<Main>$(String[] args) in /**/Program.cs:line 6

sreejith-kulamgarath avatar Jul 01 '24 14:07 sreejith-kulamgarath