asciidoctor-latex icon indicating copy to clipboard operation
asciidoctor-latex copied to clipboard

Decide what to do about sentence spacing

Open jcsalomon opened this issue 10 years ago • 5 comments

This follows the thread LaTeX back-end and sentence spacing on the discussion list.

LaTeX defaults to the classical pre-typewriter tradition of making the spaces between sentences wider than that between words. This can be turned off with the \frenchspacing command, and perhaps that will be necessary in the end—but many people who use LaTeX prefer the older style, and I wonder whether an AsciiDoctor-to-LaTeX translator will be capable of getting this right.

(This effect is achievable in HTML-based back-ends as well: With CSS, give paragraphs the word-spacing property of [say] 0.25em, then wrap sentences in a <span> with a class whose word-spacing has been set to zero. And perhaps if the LaTeX back-end gets sentence spacing right, an option to do the same in HTML might be appropriate. But I suspect this will be a feature little desired. At any rate…)

TeX uses (approximately) the following heuristic for determining sentence-end: a full-stop preceded by a lower-case letter ends an sentence; preceded by an upper-case letter it does not end a sentence (it’s assumed to be an abbreviation); and punctuation like quotes are skipped over. To override this when needed, LaTeX defines the \@ command. See Will Robertson’s LaTeX Alive: Correct punctuation spaces or the following examples:

J. K. Rowling did not shoot J. R\@.  Kristin Shepard did. 

(The \@ before the full-stop tells LaTeX that this does end the sentence, while the other full-stops after capital letters do not.)

Dr.\@ Frankenstein, how good to see you.  Come on in. 

(The \@ after the full-stop tells LaTeX that this does not end the sentence, although normally full-stops after lower-case letters do.)

(I have added a space to the code to show more clearly where the desired end-of-sentence spaces are, but this is neither needed nor recognized in LaTeX.)

As I wrote above, perhaps the best thing to do—certainly the easiest—would be to have the LaTeX back-end emit the \frenchspacing command, which will turn all this off. But still—is it at all possible to define a (non-disruptive & optional) syntax which will allow the user to make use of this typographical nicety?

jcsalomon avatar Nov 05 '14 17:11 jcsalomon

I am not a great fan of \frenchspacing. We could certainly have a default and a switch for '\frenchspcing`. The main question, if I have understood correctly, is what should the default be?

jxxcarlson avatar Nov 07 '14 21:11 jxxcarlson

To be absolutely clear to people reading this, since the term “French spacing” is used to mean opposite things (and neither of them French):

  • When \frenchspacing is used, TeX does not expand spaces after full stops;
  • When \nonfrenchspacing is used, TeX does expand spaces after full stops.

And @jxxcarlson, the issue is not so much what the default should be but rather, how shall we indicate the equivalent of LaTeX’s \@ when a \nonfrenchspacing régime is in effect? If a suitable syntax cannot be found, we will have little choice but to use the \frenchspacing we neither of us like.

jcsalomon avatar Nov 12 '14 19:11 jcsalomon

I’ve thought about this again, and I may have a possible solution. Is it possible to teach standard AsciiDoctor to completely ignore \@ while AsciiDoctor-LaTeX passes it through to the LaTeX back-end? Or (if this would be easier) perhaps some other symbol-sequence, which vanilla AsciiDoctor ignores and AsciiDoctor-LaTeX translates into \@ for the LaTeX back-end.

jcsalomon avatar Jun 05 '16 03:06 jcsalomon

As it turns out, Asciidoctor already ignores \@. So you are free to use that in the input. Naturally, this will get left behind by other converters. What I recommend, then, is introducing an inline macro that has a custom regex to match \@. You can then match \@ and convert it to something else in other backends (such as in HTML).

For an example of an inline macro with a custom regex, see https://github.com/asciidoctor/asciidoctor-extensions-lab/blob/master/lib/mentions-inline-macro.rb

You could play other tricks such as using double or triple spaces. That way, all other converters just ignore the repeated space.

Finally, you could have an implicit attribute such as ssp (for sentence space). Then you can simply reference it as:

J. K. Rowling did not shoot J. R{ssp}.

That's probably the most universal solution, despite being slightly less elegant. Of course, you could support both the implicit attribute and the inline macro and let the user decide.

mojavelinux avatar Aug 15 '16 07:08 mojavelinux

Here’s a plausible input heuristic: A period (possibly followed by single and/or double close-quotes) ends a sentence if-and-only-if either a double space or a new-line follows.

I.e.,

Mr. Smith visited the UN.  It was very big.

The single space after “Mr.” means it’s not a sentence-ending period; the double space after “UN.” means that one is, so the translator has enough information to yield the LaTeX code

Mr.\@ Smith visited the UN\@.  It was very big.

The question is, of course, whether the Asciidoctor scanning code preserves enough of the input to apply this heuristic in the processing.

jcsalomon avatar May 23 '18 23:05 jcsalomon