wikiblame
wikiblame copied to clipboard
HTML start_token is outdated
I'm currently getting matches to JS code inside HTML script
tags on English Wikipedia, and it's because $start_token
inside chop_content()
is not working.
https://github.com/FlominatorTM/wikiblame/blob/64a254548d06d844ce435b58d039039e49abaeab/shared_inc/wiki_functions.inc.php#L318
The article data now starts with <div class="mw-content-ltr mw-parser-output"
, but there's also <div class="mw-content-rtl mw-parser-output"
on RTL scripts.
The most future-proof solution would be using the API: https://en.wikipedia.org/w/api.php?action=parse&page=API&prop=text&disableeditsection=&formatversion=2 gives approximately the same result (including the removal of [bearbeiten]
links, but in all languages), but the output is generally stable.
Thanks for the issue @kidhanis and for the suggestion @tacsipacsi, which I implemented