wikiblame icon indicating copy to clipboard operation
wikiblame copied to clipboard

HTML start_token is outdated

Open kidhanis opened this issue 10 months ago • 1 comments

I'm currently getting matches to JS code inside HTML script tags on English Wikipedia, and it's because $start_token inside chop_content() is not working. https://github.com/FlominatorTM/wikiblame/blob/64a254548d06d844ce435b58d039039e49abaeab/shared_inc/wiki_functions.inc.php#L318

The article data now starts with <div class="mw-content-ltr mw-parser-output", but there's also <div class="mw-content-rtl mw-parser-output" on RTL scripts.

kidhanis avatar Apr 06 '24 06:04 kidhanis

The most future-proof solution would be using the API: https://en.wikipedia.org/w/api.php?action=parse&page=API&prop=text&disableeditsection=&formatversion=2 gives approximately the same result (including the removal of [bearbeiten] links, but in all languages), but the output is generally stable.

tacsipacsi avatar Apr 06 '24 15:04 tacsipacsi

Thanks for the issue @kidhanis and for the suggestion @tacsipacsi, which I implemented

FlominatorTM avatar May 20 '24 07:05 FlominatorTM