PdfSharpCore icon indicating copy to clipboard operation
PdfSharpCore copied to clipboard

FormatException thrown when attempting to open a file

Open franklbt opened this issue 3 years ago • 4 comments

I have a PDF that cannot be opened via your librairy (attached to this issue). This PDF throw a FormatException on the following call:

PdfReader.Open(stream, PdfDocumentOpenMode.Import, PdfReadAccuracy.Moderate)

The stacktrace related to this:

FormatException
   at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
   at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
   at System.Int32.Parse(String s, NumberStyles style)
   at PdfSharpCore.Pdf.IO.Lexer.ScanHexadecimalString()
   at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken()
   at PdfSharpCore.Pdf.IO.Parser.ScanNextToken()
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop)
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop)
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
   at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream)
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy)
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)

I think this issue is related to the encoding of the file. FYI the file can be opened on Chrome and the PDF.JS lib, but not on slack for example, where it was detected as a binary file.

23e9c85a-71d9-4030-a84c-556f31eca207.pdf

PS: The file was generated automatically via this site: https://www.cvgenerator.co.uk/

franklbt avatar Apr 05 '22 10:04 franklbt

What's the value of the variable "value" in ParseIt32 ?

ststeiger avatar Apr 05 '22 10:04 ststeiger

value is a string constructed by calling the string constructor with a char array of length:

char[] chArray = new char[2];
chArray[0] = char.ToUpper(this._currChar); // _currChar = (char)56
chArray[1] = char.ToUpper(this._nextChar); // _nextChar = (char)32
int.Parse(new string(chArray), NumberStyles.AllowHexSpecifier) //Here is the fail

FYI I cannot access directly to the value parameter at point of call https://i.imgur.com/vK2Rpbx.png

franklbt avatar Apr 05 '22 10:04 franklbt

try int.Parse((new string(chArray)).Trim(), NumberStyles.AllowHexSpecifier) //Here is the fail

ststeiger avatar Apr 05 '22 11:04 ststeiger

I tried to add the .Trim() method call you specified and I ended up with the same error, but with a different char code:

https://i.imgur.com/zf7EynJ.png

FormatException
   at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
   at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
   at System.Int32.Parse(String s, NumberStyles style)
   at PdfSharpCore.Pdf.IO.Lexer.ScanHexadecimalString() in PdfSharpCore\Pdf.IO\Lexer.cs:line 569
   at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken() in PdfSharpCore\Pdf.IO\Lexer.cs:line 134
   at PdfSharpCore.Pdf.IO.Parser.ScanNextToken() in PdfSharpCore\Pdf.IO\Parser.cs:line 562
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 441
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 532
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
   at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream) in PdfSharpCore\Pdf.IO\Parser.cs:line 194
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 541
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 371
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 331

And if I trim the remaining char (62) https://i.imgur.com/wL8GnsW.png an other exception is thrown :

PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x000a' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.
   at PdfSharpCore.Internal.ParserDiagnostics.ThrowParserException(String message) in PdfSharpCore\Internal\Diagnostics.cs:line 66
   at PdfSharpCore.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch) in PdfSharpCore\Internal\Diagnostics.cs:line 80
   at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken() in PdfSharpCore\Pdf.IO\Lexer.cs:line 143
   at PdfSharpCore.Pdf.IO.Parser.ScanNextToken() in PdfSharpCore\Pdf.IO\Parser.cs:line 562
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 441
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
   at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 532
   at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
   at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream) in PdfSharpCore\Pdf.IO\Parser.cs:line 194
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 541
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 371
   at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 331

franklbt avatar Apr 05 '22 13:04 franklbt