ALR hashes are injected to the first found matching tag, which might not be the targeted one
Context As part of this change on LRC feature, one of the identified issue with the implementation is that a hash might be injected to the wrong opening tag. The strategy currently is to parse the HTML with DOMDocument to identify opening tags to add a hash to. Then, we get this opening tag and look for it in the actual HTML, and add the hash to the first matching occurrence. If there is exactly the same tag as the targeted one, earlier in the HTML, that did not get the hash because it is not eligible, (typically too deep), then we inject the hash to the wrong tag. Here is an example (from this fixture):
<html>
<head>
<title>Original</title>
</head>
<body>
<main>
<article>
<header>
<h1>Original</h1>
<div></div>
</header>
<section>
<h2>Text</h2>
<p>Original content</p>
</section>
</article>
</main>
<footer>
<div>
<p>Original footer</p>
</div>
</footer>
</body>
</html>
The first
Expected behavior The LRC hash must be injected to the element identified by DOMDocument, especially when there are multiple times the same opening tag in the HTML, and that the first one is not eligible because it is too deep.
Acceptance Criteria
- Remove
attribute-added-to-bypass-dom-processor-known-issuein all tests and fixtures. Tests should pass. - The issue should not be reproducible on the identified templates:
- https://wp-media.slack.com/archives/CUT7FLHF1/p1725892450082049
- https://wp-media.slack.com/archives/CUT7FLHF1/p1725624439071799
Additional information
I conducted some experiment a few months ago on a similar issue, and it could be useful here. See this code. The idea here was to keep track of the position of the last opening tag processed and to to the search&replace starting from this position, so that we don't process what is earlier in the file. This is done with the PREG_OFFSET_CAPTURE option of preg_match_all, which we could try to use here.
On the example above, we would inject the