rouge icon indicating copy to clipboard operation
rouge copied to clipboard

Add PDF syntax to Rouge

Open petervwyatt opened this issue 1 year ago • 5 comments

Please accept this lexer for PDF syntax (a.k.a. "COS syntax").

PDF (Portable Document Format) is an object-based declarative page description language that, in reality, is a random access, binary (non-text) format. It is formally defined by ISO 32000-2:2020 and corrected by errata (please do not refer to outdated legacy Adobe documentation!). However, with care text-centric PDFs (full or portions) can be created such as might be used in documentation. This token-based, forward lexing lexer is not intended to be used with binary real-world PDFs as that is not how real PDFs need to be lexed (this will also likely generate Ruby UTF-8 errors anyway!).

We wish to leverage this Rouge PDF parser upstream in current and future PDF ISO standards and specifications based on AsciiDoc via Metanorma for use with the many code fragment examples in the documentation.

petervwyatt avatar Jul 02 '24 06:07 petervwyatt

BTW the PDFs added are fully functional PDFs that will work in products such as Adobe Acrobat Reader. Likely just need to rename with a .pdf extension. Also be careful with EOL conversions out of Github as PDFs are binary files! Normally I'd control this using .gitattributes file with *.pdf binary but because Rouge doesn't use file extensions this isn't possible.

petervwyatt avatar Jul 02 '24 06:07 petervwyatt

Failure of linelint is against the 2 different sample functional PDF files in lib/rouge/demos/pdf and spec/visual/samples/pdf. This is because PDFs are not required to have an EOL on their last line (after the %%EOF) and the Rouge grammar must support this, which is why the samples are the way they are. If this is critical to fix then the EOL can be added but there will then be no test to ensure the grammar successfully processes PDFs without the EOL.

petervwyatt avatar Jul 05 '24 00:07 petervwyatt

Maintainers (@pyrmont @tancnle @gfx ), is it possible to help review this? This would greatly help those of us who regularly work with PDF syntaxes.

Thank you!

ronaldtse avatar Sep 10 '25 07:09 ronaldtse

I'm sorry, @ronaldtse. I'm no longer a maintainer on this project.

pyrmont avatar Sep 10 '25 08:09 pyrmont

Apologies for unnecessarily tagging you @pyrmont , thank you for the quick response!

ronaldtse avatar Sep 10 '25 08:09 ronaldtse