norg-specs
norg-specs copied to clipboard
Should attached modifier cares kind of escaped character?
In norg syntax, \a
is equivalent to a
. But if \a
is just normal character,
\a*bold*
should not be a valid bold.
And if example above is not valid,
*bold\**
This must also be invalid.
This is tricky for parser because parser work at the token level.
I agree that the first example is invalid because the opening attached modifier does not have whitespace preceding it. But the second example is valid since the closing modifier is valid.
If first example is valid, it means one of these:
- we see
\
anda
as two separate nodes. so*
at 3rd column is invalid bold opener because it comes after the word charactera
- we see
\a
as single node, but escaped word characters behave differently from escaped punctuations like\?
In first case, second example should be invalid bold because **
is invalid bold closer.
I really don’t like the second case; two different type of escaped character nodes.
what mrossinek proposes is sensible here. From the parser's perspective, this may indeed mean that it has to differentiate escaped punctuation versus escaped characters, but our rules in the spec are pretty explicit. They discuss the immediate previous and next characters, so the parser has to "figure it out" when it comes to escape sequences.
On a side note, now that super verbatim could exist, what are your thoughts on escape characters in general? They're nice and convenient, but could they be superceded by some other syntax?
Spec also says that repeated *
is not a valid open/close modifier. So if first example is valid, second example should not be valid unless we treat \a
and \*
as different types of escaped modifiers.
Two or more consecutive attached modifiers of the same type (i.e.
**
,//
etc.) should be instantly "disqualified" and parsed as raw text in all circumstances and without any exceptions.
My thoughts on escape characters haven’t changed from start. They should take precedence over all grammars except free-form (currently “super verbatim” and “verbatim ranged tag”.)
Maybe I need to be more explicit. First, let me paraphrase the rules from the spec:
- an opening attached modifier must have whitespace or punctuation in front of it and no whitespace after it
- a closing attached modifier reverses this: no whitespace in front and whitespace or punctuation after it
Given that, let us look at your first case:
\a*bold*
\a
is an escaped "a" character, i.e. a "verbatim" "a". This is not whitespace or punctuation which means that the first *
is not an opening modifier, thus rendering this example invalid.
The second case:
*bold\**
- The opening modifier is fine.
- Then we have the word
bold
. Nothing special going on here. - Then we have
\*
which is an escaped*
character, i.e. a verbatim*
. - Then we have the second
*
. In front of it is no whitespace and it has whitespace (a line break) after it. Thus, this is a valid closing modifier. - Therefore, this example is valid, and the contents of the bold segment should be "bold*"
- Note: the repetition of
*
is not argument here, because the first one is escaped and has no effect on the second character.
To paraphrase: the backslash escapes whatever character comes next, therefore rendering it verbatim. No differentiation on whatever is escaped has to be done.
This escaping have higher precedence than all attached modifiers except the super verbatim suggested in #33. Otherwise writing inline math using LaTeX would be very cumbersome. (That however has some more discussion also here: https://github.com/nvim-neorg/norg-specs/issues/34#issuecomment-2248869659).
We might want to re-evaluate the precedence of the backslash w.r.t. to linkables.
\a is an escaped "a" character, i.e. a "verbatim" "a". This is not whitespace or punctuation
So you are saying that parser should distinguish \a
and \*
as different types of detached modifiers?
Will *bold*\a
also be invalid bold because \a
is “verbatim a”?
In my view, \a
is not a whitespace or a punctuation but neither a normal word character because it is “escaped”, so it will be highlighted as special character when rendered as raw content without concealing (e.g. @string.escape
in Neovim.) Parser should not handle the final escaped output (a
here), it should only see things as abstract objects ((escaped_sequence [0, 0] - [0, 2])
.)
Having two different node types (escaped_word
and escaped_punctuation
) for escaped sequences sounds bit too much to me.
One possible solution to this would be disallowing escaped normal word character. Making \a
invalid at first place.
I am explicitly stating:
No differentiation on whatever is escaped has to be done.
An escaped character is just that: an escaped character. Any character can be escaped, whether that has any use, is another question, but not one that the spec should care about.
An escaped character is neither whitespace nor punctuation. Therefore, an escaped character:
- can NOT occur in front of an opening attached modifier (because that would mean it is not opening)
- can NOT occur after a closing attached modifier (because that would mean it is not closing)
- it CAN occur in between attached modifiers because that is how you can insert e.g. a
*
character inside a bold segment: e.g.*my bold \* character*
- it can NOT occur inside super verbatim (see #33)
Oh I get it now. I haven’t thought like that. Sorry for misunderstanding.
If escaped character cannot occur after a closing attached modifier, how can I write bold:word
with only “bold” as bold and :word
as literal characters?
*bold*:\word
Should I escape the w
instead of :
to prevent :
parsed as a link modifier?
Very simple:
*bold*:\:word
-
*bold*
should be clear - using
*:
makes this a closing link modifier - then we escape a colon to make it verbatim:
\:
- and then we write word
The link modifier makes this possible because it is fine with not having whitespace after it. It may not even be necessary to escape the second colon character but I would have to double check that.
In the case that the link modifier is opening (the attached modifier appears on the right):
- The link modifier may only be preceded by a regular character (or, in other words, may not be preceded by a punctuation character nor by a whitespace character).
- The link modifier may only be succeeded by an opening attached modifier.
In the case that the link modifier is closing (the attached modifier appears on the left):
- The link modifier may only be preceded by a closing attached modifier.
- The link modifier may only be succeeded by a regular character.
If the above conditions are not met, then the character should be treated as a literal
:
.
So you are going to change this spec and redefine the attached modifier opening/closing tokens.
Will *bold*: word
be rendered as bold word
instead of bold: word
now?