djot icon indicating copy to clipboard operation
djot copied to clipboard

em, i, cite

Open snan opened this issue 2 years ago • 11 comments

A lot of the time when we use italics it's for emphasis text (<em>), other times it's book title (<cite>) or some weirdo other language quote or Linneaen flower name (in which case we have to use <i>). The commonmark way to do that is to use raw HTML, but that's more cumbersome in djot, and raw HTML isn't something we wanna leave on for world-readable forums and wikis anyway.

That's why I suggest that djot produces <b> and <i> instead of <strong> and <em>. Since the former or hypernyms or superset of the latter, they're never wrong, it's just that a lot of the time the latter are more precise (at the expense of sometimes being completely wrong).

(The other thing I've always wanted to change about Markdown is supporting • for list bullets.)

snan avatar Jul 20 '22 09:07 snan

I agree with the need to express different semantics (I would particularly like to use <q> for inline quotations), but I disagree with defaulting to <i> and <b> instead of <em> and <strong>.

I think Textile has an interesting solution where a single * is used for strong, and ** for bold; same for _ for emphasis vs __ for italics.

They don't currently extend that logic to other inline elements (<del><s>, <ins><u>, <code><tt>), but I've suggested that in https://github.com/textile/textile-spec/issues/5.

I think if Djot were to adopt this system, it would be great to apply it across the board for all presentational elements that have semantic counterparts.

waldyrious avatar Nov 06 '22 10:11 waldyrious

I don't want to use doubled delimiters; see the Beyond Markdown essay linked from the README for an explanation.

jgm avatar Nov 06 '22 17:11 jgm

My bad, I had just re-read it minutes before writing my comment above, and totally agree. Somehow it skipped my mind when writing my comment. Apologies for the noise in that regard.

That said, I still believe that if non-semantic tags are to be included in Djot, implementing them with syntax that approximates the corresponding semantic variant (for example, using the same delimiter with an additional modifier, such as *!foobar!*, and maybe only allowing this within a {...} wrapper) might be preferable to coming up with a separate (yet somehow mnemonic) set of symbols for the presentational tags.

waldyrious avatar Nov 06 '22 17:11 waldyrious

On Markdown it's easy to remember: <i> for i, and <cite> for cite, and then the shortcut * or _ for the most common case, which is em. No need to use mnemonics 💁🏻‍♀️

snan avatar Nov 06 '22 22:11 snan

A while ago in the CommonMark forum someone suggested "" … "" for <cite>, which I still like. I know there’s a policy against double delimiters, but the problem seems to me to specifically arise with repeated-character delimiters when the corresponding single character also has special syntactic meaning. As long as " alone doesn’t acquire a special meaning, "" should be fine.

As for <i>, the old ASCII convention of / … / around a word seems fine to me in the context of Djot, since intraword slashes wouldn’t be mistaken for intraword italics.

dpk avatar Nov 06 '22 22:11 dpk

Unfortunately /.../ for italics would be a disaster for any linguist (like me) using djot because /.../ has a very specific special meaning in linguistics. Nobody will want to type \/...\/ all the time! I guess {/.../} might work though.

bpj avatar Nov 07 '22 21:11 bpj

I would argue against the use of <b> or <i> to replace <strong> or <em> with. It might seem similar to you, but in non-Latin scripts the entire premise of either bold or italic can fall horribly flat. Take hanzi/hanja/kanji, you will never see italic here as we use it in Western-style texts, cursive script has a totally different function.

ashemedai avatar Dec 06 '22 12:12 ashemedai

For Western text I have my opinionated thoughts about the concept of emphasis in the abstract decoupled from italic/bold/small-caps because in various fields it is the actual font styles which are imbued with various semantics, e.g. linguists use italics for object language and Romanicists use small-caps for proto-Romance words, and swapping the styles just isn't an option, because the particular font styles are standardized markup in the field. That's not really emphasis but another use of font styles. For that reason I think there should be markup for bold, italics, small-caps, underline and strikeout, but they should be separate from the markup for abstract emphasis, insertion and deletion as discussed in #10.

I have been reluctant to bring up the question of separate markup for bold and italics because I have no good suggestion for a syntax for bold, and I'm far from sold on {/italics/} — absolutely not /italics/ for the reason stated earlier in this thread. I know @jgm doesn't like doubled delimiters, and I very much agree WRT "bare" delimiters but perhaps they might work when combined with curly brackets? If so {**bold**} and {__italics__} might work since in case someone wants to put abstract emphasis inside abstract emphasis of the same kind they can use the curly brackets like {_{_em in em_}_}. This might work with {||double underline||} as per #10 too, unless {++underline++} vs. {+insertion+} and {--strikeout--} vs. {-deletion-} might work as well!

bpj avatar Dec 06 '22 16:12 bpj

<i> is not italics and <b> is not bold.

The <i> HTML element represents a range of text that is set off from the normal text for some reason, such as idiomatic text, technical terms, taxonomical designations, among others. Historically, these have been presented using italicized type, which is the original source of the <i> naming of this element. source

The <b> HTML element is used to draw the reader's attention to the element's contents, which are not otherwise granted special importance. This was formerly known as the Boldface element, and most browsers still draw the text in boldface. source

The correct way to handle Proto-Romance terms in HTML semantics is <i lang=roa>amīcu</i> and a CSS rule such as

i:lang(roa) {
    font-style: inherit;
    font-variant: small-caps;
}

(although the convention is bad and confusing and should be replaced with the use of italics with * as in the rest of historical linguistics where possible, imo)

dpk avatar Dec 06 '22 22:12 dpk

@dpk Read my comment again: I didn't mention <i> or <b> did I? Just "italics", "bold" etc. Presumably an HTML renderer might render those as <i> and <b> but I don't really care, because my primary concern is that my non-emphasis italics and bold are marked up differently, e.g. \textit{...} rather than \emph{...}.

And we can discuss lang tags the day they provide the granularity historical linguists really need without lang="x-very-long-tags-every-where" and text-to-speech really can render Old French properly. I can and do use spans with classes, with an appropriate granularity, e.g. .obj for object language and .graph for graphemic, and then I jump through hoops with Pandoc filters to have each rendered correctly in LaTeX, email and what have you e.g. [foo]{.graph} becoming "⟨foo⟩" or "‹foo›" depending on which characters I can expect available fonts to support in a given medium.

Off topic: As for the Romanicist small-caps convention the point of it is that the boundaries between proto-Romance, Vulgar Latin and Classical Latin are fuzzy and fussy at best and not always relevant. I simplified for the non-experts but the fact is that UĬDĒRE, *vẹdẹ̄re, uĭdēre and ‹uidere› are four different levels of decreasing abstraction, each with their proper uses, which mean four different things.

bpj avatar Dec 07 '22 10:12 bpj