asciidoc-grammar-prototype
asciidoc-grammar-prototype copied to clipboard
Inline constrained & unconstrained formatting
We could improve the syntax of AsciiDoc the following:
Currently you must use "unconstrained" formattings when not having whitespace around the formatting. e.g. (Objects
s is desired): Object
s instead of Object
s (the latter doesn't work currently).
IMO we should change this constraint, so that the "normal" (latter) behavior can always be applied, regardless if there is whitespace around the formatted content or not.
@mojavelinux We talked about this on JavaLand.
@sdaschner I'm not sure we want to get rid of the constrained vs unconstrained distinction. The experience with Markdown has taught us that this is actually a benefit of AsciiDoc. For example, see the point in GFM at https://help.github.com/articles/github-flavored-markdown/#multiple-underscores-in-words. Of course, we don't want to sometimes allow constrained in a word (like asterisks) and not others (like underscores), so being consistent is good.
What I do think we should change is the rules about what the boundaries of a word are. Right now, it's difficult to understand because instead of saying that constrained goes around a word, constrained is actually around something that is surrounded by something that isn't a word. This double negative is confusing and has caveats.
Thus, in the spec I'd like to establish what we mean when we say the boundaries of a word in terms a human can understand, with examples to strengthen the meaning. Of course, the parser can be doing some crazy checks to determine what the human thinks is the boundary of the word, but suffice to say, the parsing should do the most logical thing.
Of course, the open question in this thread is still, should Object
s be interpreted as formatting. My position at the moment is that, while tempting, I say it shouldn't.
A nice way to move this conversation forward might be to put down some examples of when constrained should work and when it's necessary to switch to unconstrained and see if we can clean up the interface between them.
Hmmm, never thought of that. You're absolutely right, being consistent is pretty important. But I think the unconstrained vs constrained decision is still something which might confuse people (for instance myself at the beginning ;-)) and I would say that distinction between those two is also not quite consistent, what do you think?
To take it further, I'd say there are few options for a grammar:
-
For sake of consistency the inline formatting always works the same way and in order to preserve the underscore-separated words, the characters must be for instance doubled. So that
Hello
World ->Hello
World,Hello Wold
->Hello World
,World
s ->World
s and Hello__cruel__World -> Hello_cruel_World, whereas the single characters aren't considered as formatting. However, IMO this is a bad option. -
Or like the current AsciiDoc, but we make it clear to the user in the documentation how it works, and (also important) why it works that way (e.g. underscores).
Hello World
->Hello World
,HelloWorld
->HelloWorld
,Hello
World ->Hello
World, definitely a better option.But - if I get this correct - isn't it the case that the only real reason for this distinction is the underscore character, or maybe the acute accents - as it is common in (technical) writings?
-
So what about the adventurousness approach to ignore that character for formattings, go for alternatives for emphasized text and change the inline formattings to single characters only?
Hello World
->Hello World
, hello_cruel_world -> stays the same, Hellocruelworld -> Hellocruelworld, Hello>cruel<world -> Hello_cruel_world (just a suggestion). Wouldn't it that be easier for the user?
What do you think? Or - just to be sure - did I get the reason for the constrained vs. unconstrained distinction correctly? Or do you have other options?
@mojavelinux Any thoughts? Or should we move discussions/brainstorming about that to the Asciidoctor discussion forum?
Actually, the question is the question itself. Where should we start discussing the UniDoc grammar? I suppose here is as good as any place, though we're mixing goals a bit because we'll want the issues filed in a place that's clearly marked. When we get the Discourse forums setup (hopefully soon) we could create a dedicated forum for the discussion. Though we still need a place to record issues. Perhaps we should setup a unidoc-spec repo?
Separating the issues from the AsciiDoc grammar (& ANTLR parser) and the general discussions sounds good!
What about having a Github project where not only the discussions/issues about the specification in general can be created but also some documentation & migration pages when having changes in the AsciiDoc/Unidoc specification?
I would welcome one centralized place for those things :-)
As soon as we're ready to put a concerted effort into unidoc, we'll definitely create a dedicated repository for it, and probably a list as well. But I don't want to do it too soon because then it can be perceived as a stalled effort. Until then, I'm using unidoc labels on issues in core and the wiki (https://github.com/asciidoctor/asciidoctor/wiki/AsciiDoc-Specification-(aka-UniDoc)-Planning).