roff-rs icon indicating copy to clipboard operation
roff-rs copied to clipboard

Apostrophe in contractions is turned into \*(Aq, subsequently swallowed by pandoc

Open teythoon opened this issue 1 year ago • 3 comments

We produce manual pages using roff-rs, then render them as HTML for our web site. I have noticed that apostrophes in contractions and marking of possessive cases area not present in the produced HTML:

$ cat src/main.rs
fn main() {
    let mut r = roff::Roff::new();
    r.text(vec!["I've been a good boy.".into()]);
    println!("{}", r.render());
}
$ cargo run > astropof.1
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/foobr`
$ cat astropof.1
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
I\*(Aqve been a good boy.

$ man ./astropof.1|hd
00000000  49 27 76 65 20 62 65 65  6e 20 61 20 67 6f 6f 64  |I've been a good|
00000010  20 62 6f 79 2e 0a 0a                              | boy...|
00000017
$ pandoc -o astropof.txt astropof.1
$ cat astropof.txt
Ive been a good boy.

Now, I'm not an expert on roff, but one of the manual pages that I consult for advice on writing manual pages says not to use \(aq to escape ordinary apostrophes. https://man7.org/linux/man-pages/man7/groff_man_style.7.html says:

You should not use \(aq for an ordinary apostrophe (as in “can't”)

Through experimentation I discovered that pandoc renders both ' and \(aq just fine.

teythoon avatar Jan 12 '24 16:01 teythoon

To clarify: I think there are two issues here:

  • roff-rs unconditionally replaces apostrophes where it shouldn't (e.g. in contractions).
  • roff-rs uses the \*(Aq workaround which doesn't sit well with pandoc.

I have no idea what to do about either issue, but I wanted to report it.

teythoon avatar Jan 15 '24 18:01 teythoon

Thanks for the report!

epage avatar Jan 15 '24 18:01 epage

I just noticed the documentation of Roff::to_roff says:

Without special handling, apostrophes get typeset as right single quotes, including in words like “don’t”. In most situations, such as in manual pages, that’s unwanted.

That comment gets it wrong. In contractions, like "don't", you do want to allow renderers to use fancy glyphs, and in fact that is what rustdoc renders it to in the example:

$ echo -n don’t | hd
00000000  64 6f 6e e2 80 99 74                              |don...t|
00000007

Glyph e2 80 99 is RIGHT SINGLE QUOTATION MARK. Where you don't want that kind of fancy glyphs is code samples, which you expect people to copy and paste and have them work right.

You probably want render or to_writer instead of this method.

In fact, I switched to using this method, and this yields perfect results for me: both Debian's man and pandoc render apostrophes as 27 i.e. APOSTROPHE both in text as well as code blocks.

teythoon avatar Jan 17 '24 10:01 teythoon