Purifier duplicates anchor tags - trix-editor generated html
Hello,
I am just implementing htmlpurifier to sanitize the html generated by the basecamp:trix-editor. After purifying, it duplicates the anchor tag in 5 various places (apparently inside every other element).
Here is the code I used: https://gist.github.com/AntonioPrimera/ab8646567bac2b81c09298a83bc50532
Here is the original html: https://gist.github.com/AntonioPrimera/4e6c421aecc94589474d6b2602282505
And here is the purified html: https://gist.github.com/AntonioPrimera/ca5f6c6e1c93231a78f588fb82f4c57d
This seems like a bug, rather than a wrong configuration - I think no configuration should allow tags to be duplicated. Any help / hint is highly appreciated.
PS: I have also tried disabling Tidy, via $config->set('HTML.TidyLevel', 'none'), but the results are the same (with or without the Tidy functionality). PS2: When I remove the figure and the figcaption elements from the config, the anchor tag is not multiplied. Is it maybe because of the way the anchor tag is defined: strict - accepting just text / img tags inside?
I found the issue and a solution for my particular problem, but I still consider that the duplication of elements is a bug.
The issue is that the anchor tag is defined as "Inline" and the figure and figcaption elements are defined as "Block", so apparently when a block element is found inside an inline element, the Purifier has this buggy (or shall we call it "undocumented") behaviour, multiplying the inline element.
Probably the best solution to this is to simply destroy the block elements inside the inline element, just to keep the strict behaviour.
My solution was to define the figure and figcaption as "Inline" elements.
Awesome, thanks @AntonioPrimera! Really appreciate you looking into this.
EDIT: Wrong repository :sweat_smile: