commonmark.js Leading and trailing spaces vanish in multi-line "link text"

In this example, the spaces are preserved:

[ a ](b)

In that example, however, the spaces are swallowed:

[ 
 a 
 ](b)

Even stranger, the first space after the opening bracket creates an empty <text></text> element in the AST, while the last space before the closing bracket creates nothing.

http://spec.commonmark.org/dingus/?text=%5B%20a%20%5D(b)%0A%0A%5B%20%0A%20a%20%0A%20%5D(b)

Most (all?) CommonMark parsers show the same awkward behavior:

http://johnmacfarlane.net/babelmark2/?text=%5B+a+%5D(b)%0A%0A%5B%0A+a+%0A%5D(b)

Jun 03 '17 07:06 mgeier

When we parse a SoftBreak, we swallow leading space on the next line. So, this issue is really about whether there's a reason to do that.

One reason is that, in some modes, SoftBreak can be rendered as a simple space (--nobreaks option with cmark); so that if following spaces are not swallowed, you'll get undesirable multiple spaces in the output.

I can't recall whether there's another reason in addition to this; there might be.

Jun 03 '17 08:06 jgm

@jgm

When we parse a SoftBreak, we swallow leading space on the next line.

That's interesting, I wasn't quite aware of this. I see now that example 619 shows the swallowing behavior, but I somehow thought that the text above that example wasn't normative and that the swallowing was actually caused by some other rule.

I think I was confused by example 183 and 184, which kinda suggest that the behavior is caused by the rules of paragraphs. However, those talk about "concatenating the lines removing initial and final whitespace", which is most likely supposed to mean that only the very first and very last whitespace has to be removed. I think that examples 183 and 184 would deserve a bit more text that explains why whitespace is removed.

So, this issue is really about whether there's a reason to do that.

That's indeed an interesting question.

I don't understand why it seems to be important to avoid repeated spaces around line breaks, while internal consecutive spaces are not collapsed (see example 622).

Also, the output technologies (e.g. HTML browser, LaTeX processor) normally take care of collapsing consecutive spaces anyway, don't they?

I'm wondering what's the disadvantage of just keeping all the spaces as they are? Just to be clear: I'm speaking about inline parsing here, because in the block parsing stage, some spaces will be removed for list indentation etc.

Also, is it really worth adding several non-trivial rules about removing whitespace for situations that probably don't even happen in real live documents? I personally don't start and end lines with spaces unless I want to achieve some specific formatting with them. And I certainly don't mind (in the rare cases where I might inadvertently type an unnecessary space) if additional spaces just stay in the output document. On the contrary, wouldn't it actually be less surprising if the spaces stayed unchanged (unless they are used for block formatting or hard line breaks, of course)?

Jun 05 '17 21:06 mgeier

For reference, this has led to confusions before: https://github.com/jgm/CommonMark/issues/176.

Jun 18 '17 15:06 mgeier