sotoki
sotoki copied to clipboard
StackExchange websites to ZIM scraper
**Bounty Hunters**: See https://github.com/openzim/sotoki/issues/243#issuecomment-1170237942 for additional details When running sotoki on very large domain such as StackOverflow, memory becomes an issue. A [full run (without images) on athena](https://farm.openzim.org/pipeline/333bcb7949b73e101eff8216) worker (which...
Text is selected on following screenshot https://dev.library.kiwix.org/stackoverflow_en_nopic_2021-08/questions
We can see that for example on the Bard ZIM flavour. At the end of each post, there is a link to the author user profile... and there should not...
For some reason, there are questions that are indented in questions list (here on page 100) https://dev.library.kiwix.org/stackoverflow_en_nopic_2021-08/questions_page=100
Most likely due to an update of upstream's styles, sotoki UI doesn't render as it should. **Before** **Now** This most likely affects all websites, but maybe in different ways
When browsing a question on SO the page also shows linked and related questions with their respective scores at the bottom right 
Searching for `toto` on Stackoverflow will return a list of questions that contain the string, but also allow sorting these results by relevant, newest, active and score (and by default...
Currently searching for string `toto` returns all results pages including `toto`, including comments and answers. Considering that the snippets of text shown are too short / not always very clear,...
Currently the "real" [StackOverflow](https://stackoverflow.com/questions) allows sorting by "all" (not super useful), but also Newest, Active, Unanswered, Bountied, Frequent and Score (which I assume is our "Popular"). Having at least Newest...
For french.stacjexchange.com or esperanto.stackexchange.com, content is multilanguage: french and english. Filename should be with _mul_ ZIM metadatas should have fra,eng. Same for all other ZIM files of that kind.