typst
typst copied to clipboard
Allow text and regex show rules to match across styled content
Description
Currently, text and regex show rules don't match when using styled content, as in this example:
#show regex("an \w*"): underline
This is an example with regex show rules!
This is an _example_ with regex show rules!
It would be very useful if the show rule would match across styled content. To begin with, this could be limited to elements which have a clear text representation, such as emph in this case. For this, typst would build an internal plain-text representation of the sequence, joining all the element's plain-text representations (e.g. emph.body is text), and then match the regex against that text representation. Such an internal plain-text representation is already used to build PDF bookmarks.
For simplicity, text and regex should not match across linebreaks and parbreaks, i.e. the plain-text representation should start over new on every line- and parbreak. This would then also give ^ more meaning (see https://github.com/typst/typst/issues/5273)
For cases where multiple show rules match a piece of text disjoint, it will be necessary to split element functions. E.g. show "This is": underline together with show "is an": upper would require underline to be split, so as to apply upper, i.e. an equivalent to #underline[This ]#upper[#underline[is] an].
Use Case
Any use cases where text and regex show rules currently break or behave unexpectedly due to styled content. More specialized use cases include sentence casing (see https://github.com/alerque/decasify/issues/38), customization of bibliography (see this forum post).
The problem is for all show rules, not just regex https://github.com/typst/typst/issues/5353
That seems to be a different issue, but I have edited this issue to include text show rules. For element and label show rules, I don't see how this feature request would apply, though.
It is the same underlying issue for all linked issues.
It would be very useful if the show rule would match across styled content. To begin with, this could be limited to elements which have a clear text representation, such as emph in this case.
That's not really the problem, as the emph will already be shown and turned into italic text at the point where the text show rule is applied.
The central problem is that it's unclear how the resulting content should be styled. Right now, the style must be uniform and the replacement content will also have that style. If it goes across a style change, does it take the style of the first or second or neither? It's not really well-defined.
It's not really well-defined
In one of the use cases mentioned in the issue (my decasify package) the expected outcome is quite well defined. The existing style segments should preserve their style, but enough context should be passed to the casing functions such that the casing rules can be analyzed over the whole string, not restating on each style change segment.
I'm not sure that this issue isn't actually two totally separate problems though: the use case for text transformations might be different than the use case for applying Typst styling to the output.
The central problem is that it's unclear how the resulting content should be styled. [...] If it goes across a style change, does it take the style of the first or second or neither? It's not really well-defined.
@laurmaedje It should just preserve the style as-is, i.e. pass the match [an _example_] as styled content to the show rule (instead of the plain text). Then it's either wrapped as a whole into underline or whatever, or it's left to the user-defined function to process the content (or to only process it if it's text).
The only difficulty is, if the style and match overlap only partially, such as e.g.
#show regex("an \w*"): underline
This is an _example with_ regex show rules!
Here, the style must first be split as explained above, then again [an _example_] is passed to the show rule as styled content, finally resulting in
This is #underline[an _example_]_ with_ regex show rules!
To my understanding, this is well-defined
- If there is a style spanning across the match start/end, then first split it at the match start/end
- Pass the match as styled content to the show rule
Can you think of a counter-example where these two simple rules lead to undefined behaviour?
One can argue that the spitting of styles might lead to unexpected (though not undefined) results, and maybe it is okay to exclude this case for the time being and only match if styled content is completely enclosed by the match. However, there is in general no way to get two partially overlapping styles without splitting either, so IMHO it is fine from a design point of view.
Some more examples to demonstrate:
#show regex("an \w*"): it => ...
Here again, whatever (styled or uniform) content matches the regex is passed to the function as styled content, leaving it to the user to do whatever he wants with it.
#show regex("an \w*"): "A MATCH"
Here, the match is just replaced by the (unstyled) text, i.e. it is expected that no style is preserved. Of course unless the style completely encloses the match, as then the style is applied after the show rule:
This is an _example_ with regex→This is A MATCH with regexThis _is an example with_ regex→This _is A MATCH with_ regex
But losing style information makes sense here, as e.g. one could replace it with explicitly styled content
#show regex("an \w*"): [A *MATCH*]
It should just preserve the style as-is, i.e. pass the match [an example] as styled content to the show rule (instead of the plain text). Then it's either wrapped as a whole into underline or whatever, or it's left to the user-defined function to process the content (or to only process it if it's text).
This could work! The .text field wouldn't be accessible then, but that's kinda on the user then.
The only difficulty is, if the style and match overlap only partially, such as e.g.
I think this wouldn't actually be a problem, as the emphasis is already realized into an italic style then, which can easily be applied to multiple elements (the user will just see opaque styled elements).