benoit74

Results 604 issues of benoit74

It looks like iframes are not rewritten at all currently in mwoffliner. We should. See https://github.com/openzim/zim-requests/issues/1471#issuecomment-3043489876 This iframe comes from the Wikitext itself:

bug
pending clarification

Currently, default speed seems to induce quite a lot of "pressure" on mediawikis. From someone at minecraft.wiki: > The way they are scraping it is really bad/resource intensive Like they...

enhancement

In https://github.com/openzim/zim-requests/issues/1260#issuecomment-3324636008, we've faced an infinite loop while following continue parameters. I don't know if this is worth it, but some kind of logic detecting that we are in such...

enhancement

Looks like many wikis are using some sort of dynamic thing to add classes to support light/dark themes. See e.g. https://github.com/openzim/mwoffliner/issues/2416#issuecomment-3047990568 It could help to have a scraper option to...

enhancement
question

https://www.mediawiki.org/wiki/MediaWiki_Language_Extension_Bundle https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:Translate The `Translate` extension adds `Special:MyLanguage` links which needs to be properly remapped inside the ZIM And we probably want to automatically create one ZIM per language to not...

enhancement

https://wiki.restarters.net uses a specially crafted skin: `chameleon`, see https://wiki.restarters.net/api.php?action=query&format=json&meta=siteinfo&formatversion=2&siprop=general|skins Nice enhancement for openzim/zim-requests#1357

enhancement
skin

When we have multiple level of redirects in place, not all redirects are added correctly to the ZIM. I've made a test case on wiki.kiwix.org: `MWoffliner_Tests/Test4` is redirecting to `MWoffliner_Tests/Test3`...

bug

Currently, requests to action=parse endpoint (e.g. https://mdwiki.org/w/api.php?action=parse&format=json&prop=modules%7Cjsconfigvars%7Cheadhtml%7Ctext%7Cdisplaytitle%7Csubtitle&usearticle=1&disableeditsection=1&disablelimitreport=1&page=2%2C3%2C5%2C6-Tetramethoxyphenethylamine&useskin=vector&redirects=1&formatversion=2) are not cached at all. This query is used to retrieve, for a given article, its text, headhtml, display title, subtitle and list...

enhancement
question

Currently, the scraper push to the ZIM all articles, no matter what their contentmodel is. I feel like this is wrong, by default the scraper should scrape only `wikitext` contentmodel,...

enhancement
question

I think that we should enhance two things regarding logging in mwoffliner, but they are breaking changes. First, we should prefer to use "standard" log levels. Currently we use `'info',...

enhancement
question