Incorrect parsing of REVERSE SOLIDUS in literal string
Reporting an Issue Here
When parsing some files, I noticed some Info Elements are showing incorrect values. For example, for this file, the Producer tag:
- is shown by Acrobat as
C48x Series (PDF - 300X300 dpi). - is parsed by PDFSharp as
C48x Series (DF - 300X300 dpi)(missing P)
Expected Behavior
When parsing literal string, when a REVERSE SOLIDUS is encountered with an immediate following character not part of Table 3 of 7.3.4.2 paragraph of ISO/DIS 32000-2, the REVERSE SOLIDUS should be ignored, but the following character should be kept.
Actual Behavior
When parsing literal string, when a REVERSE SOLIDUS is encountered with an immediate following character not part of Table 3 of 7.3.4.2 paragraph of ISO/DIS 32000-2, the REVERSE SOLIDUS is ignored, as well as the following character.
Steps to Reproduce the Behavior
[Fact]
public void ReverseSolidus_with_invalid_following_character_should_be_ignored()
{
using var doc = PdfReader.Open(@"Cover-letter-4098208.pdf");
var producer = doc.Info.Producer;
producer.Should().Be("C48x Series (PDF - 300X300 dpi)");
}
Expected producer to be "C48x Series (PDF - 300X300 dpi)" with a length of 31, but "C48x Series (DF - 300X300 dpi)" has a length of 30, differs near "DF " (index 13).
The issue is most probably linked to an open question in the specification interpretation, as explained in this comment of Lexer.cs