PdfSharpCore icon indicating copy to clipboard operation
PdfSharpCore copied to clipboard

PdfReaderException: Unexpected character '0x0050' in PDF stream

Open fishrdev opened this issue 3 years ago • 2 comments

PdfReaderException when opening file

Opening a PDF file using PdfSharpCore and getting the following problem when calling the library method: PdfReader::Open(Stream stream, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy);

Exception Message

PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x0050' in PDF stream.
The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.

Stacktrace

at PdfSharpCore.Internal.ParserDiagnostics.ThrowParserException(String message)
at PdfSharpCore.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch)
at PdfSharpCore.Pdf.IO.Lexer.ScanLiteralString()
at PdfSharpCore.Pdf.IO.Lexer.ScanNextToken()
at PdfSharpCore.Pdf.IO.Parser.ScanNextToken()
at PdfSharpCore.Pdf.IO.Parser.ParseObject(Symbol stop)
at PdfSharpCore.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
at PdfSharpCore.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider, PdfReadAccuracy accuracy)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)
at PdfSharpCore.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode, PdfReadAccuracy accuracy)
at __RETAINED__ in __RETAINED__:line 34
at __RETAINED__ in __RETAINED__:line 49
at __RETAINED__ in __RETAINED__:line 115

PDF Specifics

Unable to provide you with a complete PDF file, but here are some specifics that might have led to this exception:

  • The PDF file is marked as PDF-1.4.
  • The relevant object is probably created by a Samsung printer.
  • The line with the Producer contains an un-escaped slash.
  • 0x0050 is equal to the ASCII P, which is scanned right after the un-escaped slash character, leading to an unexpected character situation.

Here is the relevant object:

3 0 obj
<</CreationDate (D:20220624122546+00'00')
/Producer (Created By SAMSUNG MFP (\PDF - 200X200 dpi))
/Creator (Created By SAMSUNG MFP)
>>
endobj

The PDF contains the following header information:

%PDF-1.4
%äðíø

Related Issue

This might be related to issue #84, where the problem appears to be this literal. As mentioned in this issue, the specification states explicitly:

Within a literal string, the backslash () is used as an escape character for various purposes, such as to include newline characters, nonprinting ASCII characters, unbalanced parentheses, or the backslash character itself in the string. The char- acter immediately following the backslash determines its precise interpretation (see Table 3.2). If the character following the backslash is not one of those shown in the table, the backslash is ignored.

fishrdev avatar Aug 25 '22 10:08 fishrdev

Hotfix for literal scanning. This reads a single slash and does not throw the exception. Unknown side effects.

edit: removed file, see pullrequest

fishrdev avatar Aug 25 '22 11:08 fishrdev

@ststeiger still facing this sorts of exception. mine is "PdfSharpCore.Pdf.IO.PdfReaderException: Unexpected character '0x00bf' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file."

when I check my input pdf file, detect blank page occurs this exception

a-k-t-e-r avatar Jan 17 '23 11:01 a-k-t-e-r