YoastSEO.js
YoastSEO.js copied to clipboard
Create a regex that selects all text after a subheading
We need a regex for capturing the text that follows the subheadings. Matching this in a regex is pretty hard, since we need to find a way for matching the text after a heading, and before the end of the text. The hard thing capturing this is with a capture, it captures the next subheading as well, so it skips the next part of the text, since the subheading is already matched. For now we use this method to be sure we capture the right blocks of text. We remove all | 's from text, then replace all headings with a | and split on a |.
This gives the desired outcome, but maybe we could improve this.
Probably we can add something after the </h*>
and something before the <h*>
. This allows us to split on that value. For example: use <yoastparagraph>
and follow the next steps:
- Add
<yoastparagraph>
after the close subheading tag (for example:</h1>
) - Add
</yoastparagraph>
before the open subheading tag (for example:<h1>
) - Add
</yoastparagraph>
to the end of the text. - Do regex to get everything between
<yoastparagraph>
and</yoastparagraph>
What do we do with nested headings? Remove inner blocks and ignore them?
If we do we can convert all headings to the same level and just split on the <h1>
then in the results split on the </h1>
and use the 1st result of the last split as found text.