zim-requests
zim-requests copied to clipboard
[Recipe Done] docs.python.org
- Website URL: https://docs.python.org/
- License: Python Software Foundation License Version 2
- Desired ZIM Title: Python Documentation
- Desired ZIM Description:
- Desired ZIM Icon –png (URL or attach one): https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/1869px-Python-logo-notext.svg.png
- Language (ISO 639-3): en
- Is this a MediaWiki?: no
A ZIM of the official docs can be build using the following script, which utilizes zimwriterfs and imagemagick to build a ZIM from the official HTML files:
# !/usr/bin//env bash
# build ZIM file of python documentation
# requires imagemagick for image conversion
# version to build docs for
VERSION="3.12.1";
VERSION_NOFIX="3.12";
# remove old files
rm "python-$VERSION-docs-html.zip";
rm -r "python-$VERSION-docs-html";
rm "python-$VERSION-documentation.zim";
# retrieve docs
wget "https://docs.python.org/3/archives/python-$VERSION-docs-html.zip" &&
unzip "python-$VERSION-docs-html.zip" &&
# build ZIM
DIRNAME="python-$VERSION-docs-html" &&
convert "$DIRNAME/_static/og-image.png" -resize 48x48 "$DIRNAME/_static/py-zim.png" &&
zimwriterfs "$DIRNAME" --welcome "index.html" --illustration "_static/py-zim.png" --language "eng" --title "Python $VERSION Documentation" --description "The official documentation for Python $VERSION" --creator "Python Software Foundation" --publisher "$USER" python-$VERSION-documentation.zim --tags "Python;_pictures:yes;_ftindex:yes;_category:documentation;_details:yes" --source "https://docs.python.org/$VERSION_NOFIX/" --name "Python $VERSION Documentation" --flavour "full"
echo "Done."
The version can be specified using the variables at the top of the script.
Result size:
$du -h python-3.12.1-documentation.zim
9.8M python-3.12.1-documentation.zim
zimcheck complains that it "found 2 empty links" in a lot of articles, but I guess that's either JS related or some failure in the original file. Either way, the ZIM works fine, including the search (both the javascript one and the xapian one), although there are still some links to external sources (e.g. the python docs for other versions) and for obvious reasons the download page itself does not work.
Thank you @IMayBeABitShy for the interim solution with zimwriterfs ; I think we will prefer to create the ZIM with zimit on the Zimfarm, but still it is valuable to know it works well with zimwriterfs
@Rexadev is this a problem is we include all Python versions? I know Python 1 and 2 are mostly unused now but anyway their documentation was way more limited and the whole volume is in any kind fairly limited I think (and compresses pretty well).
@benoit74 Will you be creating this recipe or I shall create it the normal way?
@RavanJAltaie I think you can do it, just configure https://docs.python.org instead of https://docs.python.org/3 as URL (and fix Title, add proper description, ...)
Recipe created https://farm.openzim.org/recipes/docs.python.org_en I'll update the library link once ready
@benoit74 the recipe succeeded and the file looks okay overall, just the other resources part, the first 4 links don't open.
How can I fix that?
https://dev.library.kiwix.org/viewer#docs.python.org_en_2024-05
Why do you want to fix that? These links are on another domain, I don't get it.
Btw description and title are still significantly wrong.
Title proposed by Rexadev was correct (it does not contains documentation only for v3), and description should not mislead users in believing that only v3.12.2 is included. All python versions are covered in this ZIM.
I've updated the recipe to use Zimit 2.
However I think this file has been moved a bit too fast to production, because while claiming to contain only english (given its name and metadata, in fact there is also multiple other languages included.
It seems pretty possible to split the ZIM into one ZIM per language with proper include / exclude rules.
ZIM is still valuable, so let's update it with Zimit 2, but once done we should try to split it per language
I've just noticed that the javascript search function does not work in docs.python.org_en_2024-06 (it shows an empty result page), although the xapian one obviously still does. Tested with kiwix-serve (and library.kiwix.org). The bug seems to be caused by _static/sphinx_highlight.js:162, where a function _ready(...) is called despite apparently not being defined. Said definition happens in _static/doctools.js:34 (confirmed via breakpoint to happen before the call).
@IMayBeABitShy could you be more specific about what you mean by "javascript search"? I'm not sure about which one you are speaking about.
@IMayBeABitShy could you be more specific about what you mean by "javascript search"? I'm not sure about which one you are speaking about.
Ah yes, sorry. The python docs contain a search functionality implemented in javascript, which can be used to quickly search the documentation. This is what it is supposed to look like:
This is how it looks like on library.kiwix.org (NOTE: a search term has been entered into the field):
As you can see, inputting a search term does open the search page of the python documentation, but is not showing any results and also removing the inputted search term. My preliminary investigations into this bug have been summed up in my previous comment.
The xapian search provided by kiwix still works (It's a bug with the content of the specific ZIM, not kiwix). As the javascript search (the one provided by the documentation itself) does work correctly when we build a ZIM directly from the documentation files using zimwriterfs (see my first comment on this issue), I beleive this is a bug with the recipe or the offliner.
From my tests, the problem is that docs.python.docs.python.org/3.12/_static/glossary.json file is not scrapped because it is not found as necessary by Browsertrix crawler. And it is then missing for proper search operation.
We have no way with zimit to easily indicate to the crawler that we need one more file (and there is probably one per python version).
I don't know (yet at least) how to fix this.
@RavanJAltaie @Popolechien do you consider this is a big concern which should drive us to remove the ZIM from the library? I think we can live with this bug for now, all content is still available and searchable via the classic ZIM search functionality, it is just the custom docs.python.org search page which is not working.
@benoit74 I don't think it's a problem, I can live with it easily. @benoit74 shall we push it to zimit?
@benoit74 shall we push it to zimit?
@RavanJAltaie I don't get it, what do you mean?
@benoit74 no worries, my bad, I misunderstood something. all is good.
I've updated the recipe with a custom CSS to remove the non-working search box
I forgot to adapt CSS rules for small screen resolutions ... CSS fixed and recipe requested again ...
File is done and OK in prod.