devdocs.io 404
Can you help to zim https://devdocs.io
https://tukasu.ml/devdocs.io_en_all_2020-11/ does not work as it should.
Try adding two extra items to the default set and you will see that it doesn't work. Requests start linking to https://devdocs.io, which return 404 error. For example: https://tukasu.ml/devdocs.io_en_all_2020-11/A/mp_/https://devdocs.io/docs/babel/index.json?1517763345 (return 404) https://tukasu.ml/devdocs.io_en_all_2020-11/A/mp_///docs.devdocs.io/babel/index.json?1517763345 (correct link)
This looks like a replay issue. Whatever the item, the first one works but the subsequent ones don't.
First request is made to
/devdocs.io_en_all_2020-11/A/mp_///docs.devdocs.io/cpp/index.json
The other ones looks like
/devdocs.io_en_all_2020-11/A/mp_/https://devdocs.io/docs/babel/index.json
Notice the difference after the /_mp/ part. One includes full URL, one just the path.
@ikreymer what do you think?
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
There is a recipe at https://farm.openzim.org/recipes/devdocs.io, but it still fails. We should investigate with latest Browsertrix.
Current scrape dies with:
[DEBUG] Adding fuzzy redirect https://documents.devdocs.io/matplotlib~3.1/style_api.html? -> https://documents.devdocs.io/matplotlib~3.1/style_api.html?1565298356 [DEBUG] Adding fuzzy redirect https://documents.devdocs.io/matplotlib~3.2/_as_gen/matplotlib.figure.figure.html? -> https://documents.devdocs.io/matplotlib~3.2/_as_gen/matplotlib.figure.figure.html?1609091519 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Let us release 1.2.0 and see if the bug is still there.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.