Android-FileBrowser-FilePicker icon indicating copy to clipboard operation
Android-FileBrowser-FilePicker copied to clipboard

attr_list breaks if the value contains braces, requires HTML-encoding them

Open iBug opened this issue 2 years ago • 16 comments
trafficstars

This issue originated from squidfunk/mkdocs-material#6177 where I was exploring the "override copied text" feature.

The following code block breaks Markdown parsing:

``` { .c data-copy="int main() { return 0; }" }
Try copying me for some C code
```

It doesn't quite make sense to me as everything is already inside double quotes.

Instead, HTML-encoding the braces (and you can add newlines) solves the problem:

``` { .c data-copy="int main() {
  return 0;
&#125" }
Try copying me for some C code
```

Using HTML-encoding is not intuitive to most users. Maybe this could get changed to Python escapes?

iBug avatar Oct 12 '23 09:10 iBug

Interesting issue. When the feature was implemented, we did not foresee any sort of formatted text being passed in as a value, which explains why you need to jump through hoops to make it work. For example, we might expect a class name or a value for an HTML title attribute. In either case, there would be no markup. Although, I suppose a title attribute could conceivably contain curly braces. I'll have to think about this.

Regarding the specific feature of mkdocs-material; I find it very strange that the feature even exists. I can't imagine any scenario where anyone would want that specific feature. Of course, use cases should not be limited by my imagination (or lack thereof), but it does give me pause. I will certainly not be adding specific support for newlines or other advanced formatting. The only thing I would consider here at all is to maybe allow curly braces within quoted values.

The tricky thing about this is that the initial regex does not parse the keys and values. It simply grabs the entire string (between the curly braces) and then passes that on to a separate key/value parser. We currently grab that string with \{(?P<attrs>[^\}\n]*)\} which does not allow any closing curly braces within the string, We would need to change that portion of the regex to be more greedy and match to the last closing curly brace on the line (maybe \{(?P<attrs>[^\n]*)\} or similar).

I'm curious how Superfences handles this scenario. Any input from @facelessuser or @squidfunk is welcome.

waylan avatar Oct 16 '23 18:10 waylan

Thanks for the evaluation, @waylan.

Regarding the specific feature of mkdocs-material; I find it very strange that the feature even exists. I can't imagine any scenario where anyone would want that specific feature. Of course, use cases should not be limited by my imagination (or lack thereof), but it does give me pause. I will certainly not be adding specific support for newlines or other advanced formatting. The only thing I would consider here at all is to maybe allow curly braces within quoted values.

A user wanted to explicitly set the text for the clipboard button, which we implemented in https://github.com/squidfunk/mkdocs-material/issues/6086. We consider this to be highly experimental and are considering removing this functionality again, if it turns out to be complicated for users to use. If you follow the issue, you can learn that the use case is that some users have comments in their code blocks which they don't want to be copied.

If this turns out to be too involved to implement, or it can only be partially solved (e.g. curly braces are okay, but everything else needs to be escaped) we will remove the feature again, since we don't consider the necessity for escape everything with HTML entities to be very user-friendly. It's too prone to errors.

squidfunk avatar Oct 16 '23 19:10 squidfunk

SuperFences uses attr_lists if enabled, to handle parsing code block headers of this sort. I don't personally ever use anything complicated that requires { or } in my attributes, nor have I had a need so far. I am generally indifferent as I don't use the data-copy feature that Mkdocs Material has implemented either.

With that said, I can see how and why someone would desire sane handling. I do think it is possible to craft a solution that could consume strings properly. But I also would probably not care if it was decided { and } handling wasn't fixed. Again, I don't stress the attribute system in this way currently. I guess I can't rule out the possibility that one day I may run into such an issue, but if I did, since HTML escapes can navigate me around the issue, I'd probably be okay for the 1 or 2 times I may ever need it.

facelessuser avatar Oct 16 '23 19:10 facelessuser

@squidfunk thanks for the explanation. I commented on your issue with a suggestion for a different way to solve your problem. Whether you go that route or another is your choice. But the best you can hope for from us here is to allow curly braces in values. All other limitations will remain.

@facelessuser the one question I was hoping you would answer you didn't. The attr_list parser is not the issue, The issue is that fenced_code doesn't pass the curly braces on to attr_list. I was wondering if maybe superfences wasn't so restrictive.

waylan avatar Oct 16 '23 20:10 waylan

@waylan It passes whatever is in the attribute to attr_list. This is no different than what using attr_list on a header does.

If use the following, attr_list fails to parse the attributes and they are not applied.

# Test {data-test="{}"}

When using Superfences, such an attribute would cause parsing to fail as well. If fenced_code handles it special, we do not currently model that behavior.

facelessuser avatar Oct 16 '23 20:10 facelessuser

If I were to fix this, this seems sufficient:

-((\{(?P<attrs>[^\}\n]*)\})|
+((\{(?P<attrs>[^\n]*)\})|

As it is currently written, the only thing that may come after the attr_list on the opening line is hl_lines, which I would not consider a candidate for bringing braces. Removing } from that character class allows greedy matching while staying on the opening line. Any ideas?

iBug avatar Oct 16 '23 21:10 iBug

@facelessuser the attr_list parser handles curly braces just fine.

>>> from markdown.extensions.attr_list import get_attrs
>>> get_attrs('data-test="{}"')
[('data-test', '{}')]

Like fenced_code, headers fail because the regex which grabs the attribute list is failing, not the attr_list parser itself. And that is why I ask what Superfences behavior is. It's possible that the method you use to identify the list before passing to the list parser works differently and this is a nonissue.

waylan avatar Oct 17 '23 13:10 waylan

@waylan I was comparing the actual processing of the values in Markdown:

import markdown

MD = R"""
## test {data-test="{}"}
"""

print(markdown.markdown(MD, extensions=['attr_list']))
<h2>test {data-test="{}"}</h2>

In general, we were being consistent with how attr_list was handled everywhere else, but yes, we could adjust our own pattern to deviate. If get_attrs handles them fine, we could possibly relax our code patterns.

facelessuser avatar Oct 17 '23 13:10 facelessuser

I think as far as SuperFences is concerned, we have a clear starting indicator (```), so we can probably just relax the pattern.

For normal inline patterns, attr_list would need to have a more precise pattern to avoid gobbling a whole line. Again, I don't know if attr_list plans to fix this for other cases, but at least as far as code blocks are concerned, this may be an easy fix for SuperFences.

In the worst case, if we find undesirable examples of it being too greedy, we could craft a more explicit rule for ourselves.

facelessuser avatar Oct 17 '23 13:10 facelessuser

The change has been made in SuperFences.

facelessuser avatar Oct 17 '23 14:10 facelessuser

For normal inline patterns, attr_list would need to have a more precise pattern to avoid gobbling a whole line. Again, I don't know if attr_list plans to fix this for other cases, but at least as far as code blocks are concerned, this may be an easy fix

This is what gives me pause. While it is an easy fix for fenced code blocks, it introduces an inconsistency with all other attr_lists. And to adjust the others would be a much more involved process.

waylan avatar Oct 17 '23 17:10 waylan

Yep, it is definitely more involved. While I think possible, and maybe even doable as a single (much more complex) pattern, it is not nearly as simple.

facelessuser avatar Oct 17 '23 17:10 facelessuser

FYI, we removed this feature from our documentation. The current implementation is too fragile and will likely lead to many problems when being used by non-technical users. We don't feel comfortable providing support for it at the current stage.

squidfunk avatar Oct 18 '23 15:10 squidfunk

Well, regardless, the next version of pymdown-extensions will handle braces better in code block attributes.

facelessuser avatar Oct 18 '23 15:10 facelessuser

@squidfunk thanks for the update. Given the fact that (1) there is no officially supported need for it, (2) the proposed modification would introduce a inconsistency between attr_lists on fenced code blocks compared to all other elements, and (3) Superfences provides a working solution for those who really want it, I'm inclined to not make a change to fenced code blocks at this time.

That said, if someone were to submit a PR which consistently altered the attr_list behavior across all elements to allow curly braces, I would be wiling to give it consideration. Therefore, I'm not going to close this but will give it the [someday-maybe] label.

waylan avatar Oct 18 '23 18:10 waylan

That said, if someone were to submit a PR which consistently altered the attr_list behavior across all elements to allow curly braces, I would be wiling to give it consideration.

  • Opened #1414

I believe it satisfies this requirement

oprypin avatar Nov 11 '23 11:11 oprypin