cmark-gfm icon indicating copy to clipboard operation
cmark-gfm copied to clipboard

setex heading after table

Open UziTech opened this issue 5 years ago • 15 comments

A setex heading is a block level element but it does not seem to interrupt a table.

Is this by design since it cannot interrupt a paragraph? or should it be able to interrupt a table without being preceded by a new line?

Discussion: https://github.com/markedjs/marked/pull/1598#discussion_r376831413

example

| abc | def |
| --- | --- |
| bar | foo |
| baz | boo |
title
=====

actual

abc def
bar foo
baz boo
title
=====

expected

abc def
bar foo
baz boo

title

UziTech avatar Feb 10 '20 15:02 UziTech

I would also like to know the answer to this. We have a couple of cases where this behavior seems to break protocol but we can't be sure if we should work around it or if it's intended.

calculuschild avatar Feb 20 '20 19:02 calculuschild

@github @kivikakk any feedback on this?

UziTech avatar Mar 06 '20 18:03 UziTech

I no longer work at GitHub, so I can’t help, sorry!

kivikakk avatar Mar 06 '20 22:03 kivikakk

@kivikakk thanks for the response. Do you happen to know who could help with this?

UziTech avatar Mar 06 '20 23:03 UziTech

@UziTech Unfortunately not :( Your best bet is likely to contact support.

kivikakk avatar Mar 06 '20 23:03 kivikakk

I am not cmark-gfm contributor either, I semi-regularly track its development mainly as a maintainer of MD4C, for compatibility reasons.

Yet, as a person who has some experience with Markdown parser implementation, let me to voice a strong doubt whether allowing setext header to interrupt tables is a good idea. Rationale follows in the following paragraphs.

Tables do not necessarily have to look as nice as provided in the 1st post. They may just look as this:

head1 | head2
---|---
value1

(Notice the table body rows do not necessarily have to have a pipe at all in them.)

This renders the same as

| head1  | head2  |
| ------ | ------ |
| value1 |        |

Assuming we allow setext header to interrupt the preceding table, we introduce a new problem that you simply cannot generally tell which of the preceding lines still belong to a table and which should be part of the subsequent header.

Or, from another perspective, consider how CommonMark specification defines a setext header: If the setext underline follows a paragraph, the whole paragraph becomes the header (and the underline itself gets eaten). Because the table extension allows the notation with the pipes stripped, tables cannot reasonably allow paragraphs to interrupt tables.

Changing this would require that wither the tables with pipes stripped behave differently (imho that's a bad idea for the sake of consistency) or that only lines which do have at least a single pipe in them are part of the table. That would make table parsing more complicated and slower and I am not sure whether it would not expose other problems elsewhere.

Last but not least, imho, table is in general something I call informally "a heavy-content block", similar in this to ordinary paragraphs: If we allow them in the text flow without any blank lines, they may be even hard to notice to a human eye in the raw Markdown input unless they are formatted really really nicely. Consider this from section 1.1 of the CommonMark specification:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

Because of this, my gut feeling is that paragraphs (or setext headers which are in syntax just paragraph followed with the underline) should not be allowed to interrupt the tables.

(I would also argue that for the very same reason the same for the opposite case, i.e. that tables shouldn't be allowed to interrupt paragraphs either. Interestingly enough, cmark-gfm currently behaves inconsistently in this scenario, as reported in #180)

EDIT: Additionally, consider also it would add a new very specific rules (exceptions), possibly complicating the implementation further, which would have to resolve crazy cases like e.g. this:

foo | bar
--- | ---
=====

mity avatar Mar 07 '20 07:03 mity

@mity Thanks for the thoughtful response. Just to summarize your point, are you saying because setext headings are paragraphs with an underline and paragraphs take precedence over tables that it would be difficult to determine which is desired by the user in some of those crazy cases?

UziTech avatar Mar 10 '20 15:03 UziTech

As a markdown parser maintainer I feel like requiring a pipe in a table row is something that should be done anyways. I know it is not a requirement now, but I feel like it should be given the markdown design goal you stated. It would make parsing tables much easier and less ambiguous.

As a markdown user I would expect something like the code below to be parsed as a table with just a header and a setext heading below the table.

foo | bar
--- | ---
asd
===

If someone wanted the last two lines to be part of the table they could add pipes. It is easier to visually parse and better subscribes to the markdown design goal.

foo | bar
--- | ---
asd |
=== |

UziTech avatar Mar 10 '20 15:03 UziTech

or maybe this should be undefined behavior since there should be a blank line between the two blocks.

UziTech avatar Mar 10 '20 15:03 UziTech

Just to summarize your point, are you saying because setext headings are paragraphs with an underline and paragraphs take precedence over tables that it would be difficult to determine which is desired by the user in some of those crazy cases?

Yes, that's one of my arguments. Consider e.g. this:

A | B
---|---
line1
line2
line3
=====

You cannot reasonably determine which lines line[1-3] are part of the table and which form the header after the table.

mity avatar Mar 10 '20 15:03 mity

If someone wanted the last two lines to be part of the table they could add pipes. It is easier to visually parse and better subscribes to the markdown design goal.

I might agree if such a change would not break tons of documents out there. But I guess it would because GFM supported it for the long time without the pipes.

mity avatar Mar 10 '20 15:03 mity

Very good points. I see the ambiguity in the example with three lines. And I agree the change to require pipes would probably not work out well.

I think this is sufficiently resolved to close this issue.

@mity Thank you for your valuable input.

UziTech avatar Mar 10 '20 15:03 UziTech

Is this something that should be added to the gfm spec then? I think that is the core issue here: the spec is contradictory or at a minimum ambiguous.

calculuschild avatar Mar 10 '20 15:03 calculuschild

Is this something that should be added to the gtm spec then?

Ideally, yes.

I think that is the core issue here: the spec is contradictory or at a minimum ambiguous.

Imho, all the GFM extensions are quite under-documented in the specification. This is just one example. The current reality is that if you want to be reasonably compatible with GFM, you have either study cmark-gfm code and/or simply test how cmark-gfm parses problematic/ambiguous/unspecified corner cases.

mity avatar Mar 10 '20 15:03 mity

I created PR #185 to add an example showing that a setext heading does not break a table.

Let's hope someone at @github is actually watching this repo. 🤞

UziTech avatar Mar 10 '20 17:03 UziTech