kramdown-rfc icon indicating copy to clipboard operation
kramdown-rfc copied to clipboard

Spurious irefs added after section when abbrev links to heading

Open CxRes opened this issue 2 years ago • 17 comments

This is a complicated one to explain. Please see the attached test case. When the *[Accept-Events]: abbrev links to #foo (L94), an id attached to a section heading, a spurious <iref> of a previously defined definition is introduced after the section heading Foo. Replace the #foo with an inline id, say #a-e (at the end of the previous para), the <iref> vanishes.

Further, instances of Accept-Events are not wrapped unlike other definitions. Again, this is remedied if the id is changed from #foo to #a-e.

Original: draft-gupta-httpbis-per-resource-events-test-2.md

Output: draft-gupta-httpbis-per-resource-events-test-2.xml.txt


Also, it seems the conversion wraps code elements (inside backticks) are wrapped with <spanx> and not <tt> even with -3 flag. Thus, the engine does not replace any Accept-Event instances that are wrapped in back ticks (even when linking to a non-heading id).

Actual output <spanx style="verb">Accept-Events</spanx>

Desired output <tt><iref item="Accept-Events"/><xref format="none" target="a-e">Accept-Events</xref></tt>

CxRes avatar Sep 11 '23 20:09 CxRes

@cabo If you are back at work, can you please have a look at this issue. Thanks!

CxRes avatar Sep 22 '23 03:09 CxRes

Also, it seems the conversion wraps code elements (inside backticks) are wrapped with <spanx> and not <tt> even with -3 flag. Thus, the engine does not replace any Accept-Event instances that are wrapped in back ticks (even when linking to a non-heading id).

Actual output <spanx style="verb">Accept-Events</spanx>

Desired output <tt><iref item="Accept-Events"/><xref format="none" target="a-e">Accept-Events</xref></tt>

Unfortunately, code elements (backticks) are protected from the *[...] support that kramdown provides. I don't think this is easy to change, as backticks have several such protection semantics, and disabling the protection just for this processing may not be easy.

Looking at the other part of the report now; I'm not sure yet I understand what you expect to see.

cabo avatar Sep 22 '23 05:09 cabo

You can work around the protection afforded by backticks by manually adding the <tt> and </tt> tags around the phrase, as in

<tt>Accept-Event</tt>

cabo avatar Sep 22 '23 06:09 cabo

Looking at the other part of the report now; I'm not sure yet I understand what you expect to see.

Corresponding to line 94 to 96 in the markdown: In line 105 of the XML, the following <iref item='application client' primary='true'/> should not exist.

You can work around the protection afforded by backticks by manually adding the and tags around the phrase

That is exactly what I am doing in my actual draft. Just makes the text unmarkdownish!

CxRes avatar Sep 22 '23 06:09 CxRes

Please try 1.6.42 for the weird shift-around.

I'll continue to look for ways to get code spans into autolink/iref, but I'm not optimistic.

cabo avatar Sep 22 '23 13:09 cabo

Unfortunately, its not fixed.

New output on the same markdown draft-gupta-httpbis-per-resource-events-test-2.2.xml.txt

My understanding is than an <iref> on line 105 should not exist. Further, Accept-Events are no longer getting replaced at all. See line 107 and 109 in the new draft and diff that with the old.

CxRes avatar Sep 22 '23 14:09 CxRes

So you don't want the functionality of autolink-iref-cleanup: true at all? Set it to false then.

cabo avatar Sep 22 '23 14:09 cabo

What does autolink-iref-cleanup do?

CxRes avatar Sep 22 '23 14:09 CxRes

What does autolink-iref-cleanup do?

It essentially suppresses auto-irefs inside the section that is also auto-linked (the #foo section in your case). We considered it confusing for an occurrence inside a section to reference that very section in an auto-link. Instead, the iref is moved up to that section, and the links are removed.

cabo avatar Sep 22 '23 14:09 cabo

Interesting... you really ought to document that!

It did not occur to me to put Accept-Events in another section to test.

CxRes avatar Sep 22 '23 14:09 CxRes

Interesting... you really ought to document that!

Absolutely, and I'll keep this ticket open until that is done.

cabo avatar Sep 22 '23 15:09 cabo

I'll continue to look for ways to get code spans into autolink/iref, but I'm not optimistic.

In that case, I would need your help finding some kind of hack. I am defining a field called "Events", which I am wrapping in <tt>, which I want indexed. However, I do not want every instance of the word Events to be indexed.

What I would like to do is this, but it does not work:

*[<tt>Events</tt>] #events-header (((Events)))

What I have tried is to use <tt>{{&zwsp}}Events</tt> and then define the index replacement as:

*[{{&zwsp}}Events] #events-header (((Events)))
*[&zwsp;Events] #events-header (((Events)))
*[<zero-space-typed-out-in-unicode>Events] #events-header (((Events)))

None of these have worked! Any ideas?

(I am happy to take this discussion elsewhere if you feel it is not appropriate here)

CxRes avatar Sep 23 '23 08:09 CxRes

The abbrev feature of upstream kramdown does not mix well with entity references. If you actually type out the zwsp (or any other character), it works.

Of course, with many editors these native zwsps are hard to see in the manuscript, and they get in the way in the HTML form (copy/paste gets them, searches don't always work, etc.). So I'm thinking of providing a way to designate certain Unicode characters as to be suppressed in the output, so you can use them for such a purpose and remove them from the final XML.

So then you would

*[⁜Events]: #events-header (((Events)))

This sentence is about the ⁜Events that need to be cited in the index, and other Events that don't.

and declare:

character:
  ⁜: [delete]

in the YAML header so the ⁜ is not actually shown after processing. (This would go with a general Unicode character feature, which will allow to warn about unexpected characters (e.g., typographic quotes) etc. , which needs a way to declare non-ASCII-characters as "expected".)

Feedback welcome...

(I'm also still thinking about a way to have abbrevs reach into backticks, but my simple prototype for this doesn't work yet.)

cabo avatar Sep 23 '23 12:09 cabo

IMHO, the cleanest way to solve this would be to preserve Kramown's abbreviation feature. So then you could do:

*[=Events=]: #events-header (((Events))) Events

This sentence is about the <tt>=Events=</tt> that need to be cited in the index, and other Events that don't.

or even

*[=Events=]: <tt> #events-header (((Events))) Events </tt>

This sentence is about the =Events= that need to be cited in the index, and other Events that don't.

I don't know if this is feasible to implement. Also, this would be a breaking change, but it could be possibly be implemented under a flag! It also circumvents the backticks problem. What do you think?

CxRes avatar Sep 23 '23 13:09 CxRes

Experimental feature: In 1.6.43, you can now use

*[=Events=]: #header-fields (((Events))) `Events`

Any text added to an abbrev definition beyond what is already allowed now becomes replacement text (in markdown form) for what was matched.

Known bugs: Combining the <nobr> flag with complex XML replacements can crash kramdown-rfc or generate invalid XML output at this point.

cabo avatar Sep 23 '23 16:09 cabo

Just tested it out. This is AWESOME!

Thank you so much for fixing these issues. Much Appreciated!!!!!!!

CxRes avatar Sep 23 '23 17:09 CxRes

I have added documentation for the flag in the Syntax2 wiki. Once you finish, I will add the experimental part as well.

CxRes avatar Sep 23 '23 18:09 CxRes