Atributika
Atributika copied to clipboard
Adds start and end character positions to tag structure - available to tag transformers
This change introduces tag character positions relative to the original string as part of the Tag
structure. These can then be used by TagTransformer
transformation functions.
It may not be immediately obvious why this change may be useful, but I have found it to be quite useful for extracting content that wouldn't be suitable for attributed string transformation from within content that is suitable for attributed string transformation.
For example, if the html being transformed is mostly transformable content, but contains an iframe
tag, or a Twitter blockquote
somewhere within it, the positions of these tags (opening and/or closing) have been useful in order to split, extract, and treat them accordingly.
I've used emojis with variations to include grapheme clusters in the unit test to ensure the String.Index
values handle these correctly (via UTF16).