benoit74
benoit74
This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially python-scraper lib) - [ ] Adjust version...
We use both `os.path`, `path.py` and `pathlib` in the codebase. This is pretty confusing in many situation. We should use only one, and `pathlib` is the one to choose.
In Gutenberg logs, only one logger `name` (`gutenberg2zim.constants`) is used making it pretty useless. ``` [gutenberg2zim.constants::2023-08-19 11:40:30,563] INFO: Parsing file cache/epub/99/pg99.rdf for book id 99 [gutenberg2zim.constants::2023-08-19 11:40:31,442] INFO: Parsing file...
Upgrade Python + JS + CSS dependencies to latest versions
See https://github.com/openzim/zim-requests/issues/841 Most probable explanation is that the scraper does not handle well the situation where a book has multiple associated languages.
This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially python-scraper lib) - [ ] Adjust version...
A [few lines](https://github.com/openzim/gutenberg/blob/5be8c9b3fd6dfb61d69219a4fe1d703b5b8a9857/gutenbergtozim/rdf.py#L281-L288) are forcing the presence of the PDF format for all books. This is causing broken links / weird buttons in all books which do not have a...
Do not force the presence of HTML format for all books anymore (see [here](https://github.com/openzim/gutenberg/blob/1007f4ae308078280c569fd83abffa8da611693e/gutenbergtozim/download.py#L120-L121)). Especially once https://github.com/openzim/gutenberg/issues/95 has been solved, this makes little sense to always embed the HTML format,...
#191 had raised many Beautifulsoup type issues (most in `update_html_for_static` in `export.py`) for which we had to add `# type: ignore` hint Could those issues be fixed with a more...
When updating an ORM object (e.g. `book.html_etag = etag`), #191 had to add `# type: ignore` hint because pyright was complaining that a `str` cannot be used to set a...