djot icon indicating copy to clipboard operation
djot copied to clipboard

Suggested syntax for {|underline|}, {!strikeout!} and {.small caps.}

Open bpj opened this issue 2 years ago • 18 comments

In the announcement thread on pandoc-discuss I suggested to add these syntaxes:

{|underline|} 
{!strikeout!}
{.small caps.}

@jgm asked me to open an issue here.

I would very much appreciate to have a syntax for small caps in particular.

bpj avatar Jul 18 '22 18:07 bpj

Note that djot already has syntax for:

  • {-deleted-} (strikeout / strikethrough) and
  • {+inserted+} ("underline").

As for smallcaps... I'm seeing three kinds of span syntax in djot:

  • text bounded by {char and char} (where sometimes the braces are optional)
    • eg.: {*bold*}, {_italic_}, {+inserted+}, {-deleted-}, {~sub~}, {^sup^}, {=highlighted=}
  • text bounded by colons (note: adding curlies {:foo:} stops the colon syntax)
    • eg.: :smiley:, :+1:
  • text bounded by backticks and prefixed with something
    • eg.: $`a = b + c` for math

So, some alternatives for a smallcaps syntax that might work with djot:

  • {.Some Small Caps.}
  • Can't just use .bound by dots. (too common in regular text), though maybe prefix bounding colons, like .:Some Small Caps:
  • .`Some Small Caps` (prefixing the backticks by something other than $)

I kinda' like {.That First Option.}. The only problem is that {.foo.} (for smallcaps) looks an awful lot like {.foo} (for attributes (what you'd write to get class="foo")) --- I think those are just too similar.

Maybe another punctuation character:

{,Some Small Caps,}

{;Some Small Caps;}

I think the comma looks good --- maybe even better than the dot; if you squint, the comma looks a little like a very tiny arrow pointing down, which is very nice for small caps. :smiley:

uvtc avatar Jul 20 '22 02:07 uvtc

Only the first form works to apply formatting to an arbitrary sequence of inline elements. The second and third form operate on plain unformatted text.

For other characters, think =, #, %, and . are out, because of conflicts with attribute (and command and raw) syntax. That leaves

{,
{@
{$
{&
{|
{,
{;
{/
{?

jgm avatar Jul 20 '22 23:07 jgm

Hm. My 2 cents:

  • {/Foo Bar/} looks like it would be for italic, but there's already syntax for italic.
  • $ is already used for math.
  • {|Foo Bar|} is pretty, though not sure what it looks like it would be syntax for.
  • {?Foo Bar?} looks too much like a question ({?Have you tried the eggplant?})
  • {&Foo Bar&} and {@Foo Bar@} are pretty bulky, and, like |, they don't suggest any markup to me.

That leaves the comma and the semicolon.

uvtc avatar Jul 21 '22 01:07 uvtc

Maybe:

  • {,small caps,}
  • {!strikeout!}
  • {;underline;}

Problem is that none of these are at all suggestive of what they mean. ! looks like it is emphasizing the text rather than striking it out. , is not TOO bad for small caps. ; is horrible for underline. If we wanted underline, it could be better to reserve {_ for that and use {/ for italic emphasis. However, it's important to have a syntax for emphasis that generally doesn't require the {, and / is widely used between words.

One option could be to require the { when _ or * or / is used inside a word, and only allow them to be used "bare" when they are on the edge of a word. I considered that option earlier, but identifying word boundaries is tricky unless we build a lot of unicode logic into the parser (detecting character classes), which I'd hoped to avoid.

jgm avatar Jul 26 '22 17:07 jgm

, gives me thin space vibes (from LaTeX). Not sure if something like that would make sense here. But that would be my first hunch

wooorm avatar Jul 26 '22 18:07 wooorm

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line! The idea with {!strikeout!} is that strikeout "cancels" text and ! means negation in many programming languages. It's far from perfect but it's something, although I agree that it's not at all obvious to non-programmers.

bpj avatar Jul 26 '22 21:07 bpj

BTW I'm fine with commas for small caps. I just took the dots from my old Perl script which I mentioned on pandoc-discuss, not realizing at the moment that it might clash with classes in attributes. As I mentioned I used to use {/italics/} but /italics/ is horrible. Requiring that it is flanked with whitespace or ASCII punctuation won't cut, since there are plenty of non-ASCII punctuation.

bpj avatar Jul 26 '22 21:07 bpj

The main reason /italics/ is horrible from a linguist's POV is that it clashes with phonemic notation, no matter what characters you require it to (not) be surrounded by. Its on a par with not allowing mathematicians to use < and > for less-than and greater-than.

bpj avatar Jul 26 '22 21:07 bpj

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line!

True! Maybe that's not so bad.

So, maybe the best idea would be {|underline|} and {,small caps,} and {!strikeout!}.

jgm avatar Jul 26 '22 21:07 jgm

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line!

True! Maybe that's not so bad.

So, maybe the best idea would be {|underline|} and {,small caps,} and {!strikeout!}.

@jgm , are you suggesting replacing:

  • {+underline+} with {|underline|}, and
  • {-strikeout-} with {!strikeout!}?

I think that {-strikeout-} already looks good. It looks like it suggests strikethrough / strikeout. {!this!} not only makes me think "warning", but also, the tall slim characters (including |) don't look good inside the curlies, IMO.

Underline markup is not used very often. If I needed it, and if it weren't {_underline_} I bet I'd have to look up its syntax to figure out whether it's {+this+} or {|this|} (neither of which make me think, "underline").

Since bold is used much less often than italic, if I were starting from scratch, I'd consider:

  • *italic* or {*italic*}
  • {+bold+}
  • {_underline_} (aka inserted)
  • keep {-strikeout-} (aka deleted)
  • {,Small Caps,}

and keep the others as they are ({~sub~}, {^sup^}, {=highlighted=}).

uvtc avatar Jul 27 '22 02:07 uvtc

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason. While it is true that the Pandoc AST doesn't currently have any dedicated elements for making the distinction — beyond rendering one or the other with a span with a class — I think it is an important distinction and I see no reason why djot cannot make it. Djot can output HTML directly and the distinction may anyway (unfortunately) be moot in some other formats like LaTeX, so IMO its being lost in transition to the Pandoc AST isn't a huge deal, however unfortunate. Thanks to the braces it is feasible to remove deleted material or remove the {+ and +} markup with regex[^1], which provisionally makes the semantic distinction meaningful anyway.

[^1]: In most dialects something like \{\-.*?\-\} (in Lua %{%-.-%-%}) will do since nested deletions are probably not a thing anyway. In Perl you could handle even that with the Regexp::Common::balanced module:

``````perl
use Regexp::Common qw[balanced];

$text =~ s/$RE{balanced}{-begin=>'{-'}{-end=>'-}'}//g;
``````

bpj avatar Jul 27 '22 09:07 bpj

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason.

For discussion about ins del sub etc. see https://github.com/jgm/djot/issues/15 .

dumblob avatar Jul 27 '22 19:07 dumblob

Yes, that's my thinking. Semantically, "inserted" and "deleted" are different from "underline" and "strikethrough," even if that's how browsers render them typically.

jgm avatar Jul 27 '22 19:07 jgm

In that case, then those look to me like pretty good uses of {|pipes|}, {!bangs!}, and {,commas,}.

The {|pipes|} make some sense for underlines to me too, since pipes are also used for tables (which are, of course, themselves made up of lines). And the bangs seem good too given, as @bpj points out, their association with "not".

It seems like a nice feature of djot that it may contain not only ways to mark insert and delete (for providing feedback to someone on proposed changes), as well as explicit underline and strikethrough. I don't know of another light markup format that provides that.

And it also leaves (at the least) {@at@}, {&amp&}, {;semi;}, and {?question mark?} still available for possible future use if something else is needed down the road.

BTW, I really like djot's simplicity of using the curlies to disambiguate syntax when necessary.

uvtc avatar Jul 30 '22 17:07 uvtc

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason.

For discussion about ins del sub etc. see #15 .

See also #13 for discussion about underline and strike-through (including an alternative syntax proposal for them).

waldyrious avatar Nov 06 '22 11:11 waldyrious

@waldyrious what do you mean about #13? I see nothing there which is relevant to this, or in the spirit of djot; djot specifically rejects doubled delimiter characters, IMO for good reasons.

bpj avatar Nov 06 '22 11:11 bpj

Indeed, I was careless with my comment there and the reference to the thread here — my apologies. I have now added a (hopefully) more considered comment to that thread.

The relevance of both of my comments there to this issue lies in the syntax proposal for presentational tags, including those discussed here, namely underline and strikeout.

waldyrious avatar Nov 06 '22 17:11 waldyrious

I was reading the cheatsheet, and I noticed that code didn't have support for {` and `}, which would bring it in line with italic and bold:

Markup Result
_italic_ or {_italic_} italic
*bold* or {*bold*} bold
`verbatim/code` verbatim/code

evanrelf avatar Nov 07 '22 03:11 evanrelf