PdfSharpCore
PdfSharpCore copied to clipboard
FormatException thrown when attempting to open a file
I have a PDF that cannot be opened via your librairy (attached to this issue). This PDF throw a FormatException on the following call:
PdfReader.Open(stream, PdfDocumentOpenMode.Import, PdfReadAccuracy.Moderate)
The stacktrace related to this:
FormatException
at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
at System.Int32.Parse(String s, NumberStyles style)
at PdfSharpCore.Pdf.IO.Lexer.ScanHexadecimalString()
at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken()
at PdfSharpCore.Pdf.IO.Parser.ScanNextToken()
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop)
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop)
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)
I think this issue is related to the encoding of the file. FYI the file can be opened on Chrome and the PDF.JS lib, but not on slack for example, where it was detected as a binary file.
23e9c85a-71d9-4030-a84c-556f31eca207.pdf
PS: The file was generated automatically via this site: https://www.cvgenerator.co.uk/
What's the value of the variable "value" in ParseIt32 ?
value is a string constructed by calling the string constructor with a char array of length:
char[] chArray = new char[2];
chArray[0] = char.ToUpper(this._currChar); // _currChar = (char)56
chArray[1] = char.ToUpper(this._nextChar); // _nextChar = (char)32
int.Parse(new string(chArray), NumberStyles.AllowHexSpecifier) //Here is the fail
FYI I cannot access directly to the value parameter at point of call https://i.imgur.com/vK2Rpbx.png
try
int.Parse((new string(chArray)).Trim(), NumberStyles.AllowHexSpecifier) //Here is the fail
I tried to add the .Trim() method call you specified and I ended up with the same error, but with a different char code:
https://i.imgur.com/zf7EynJ.png
FormatException
at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
at System.Int32.Parse(String s, NumberStyles style)
at PdfSharpCore.Pdf.IO.Lexer.ScanHexadecimalString() in PdfSharpCore\Pdf.IO\Lexer.cs:line 569
at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken() in PdfSharpCore\Pdf.IO\Lexer.cs:line 134
at PdfSharpCore.Pdf.IO.Parser.ScanNextToken() in PdfSharpCore\Pdf.IO\Parser.cs:line 562
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 441
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 532
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream) in PdfSharpCore\Pdf.IO\Parser.cs:line 194
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 541
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 371
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 331
And if I trim the remaining char (62) https://i.imgur.com/wL8GnsW.png an other exception is thrown :
PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x000a' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.
at PdfSharpCore.Internal.ParserDiagnostics.ThrowParserException(String message) in PdfSharpCore\Internal\Diagnostics.cs:line 66
at PdfSharpCore.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch) in PdfSharpCore\Internal\Diagnostics.cs:line 80
at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken() in PdfSharpCore\Pdf.IO\Lexer.cs:line 143
at PdfSharpCore.Pdf.IO.Parser.ScanNextToken() in PdfSharpCore\Pdf.IO\Parser.cs:line 562
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 441
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop) in PdfSharpCore\Pdf.IO\Parser.cs:line 532
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) in PdfSharpCore\Pdf.IO\Parser.cs:line 405
at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream) in PdfSharpCore\Pdf.IO\Parser.cs:line 194
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 541
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 371
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode) in PdfSharpCore\Pdf.IO\PdfReader.cs:line 331