mwoffliner icon indicating copy to clipboard operation
mwoffliner copied to clipboard

Article scraped incompletely

Open Inbefortus opened this issue 3 years ago • 2 comments

ZIM: 2021-03 English Wikipedia 2021-07 English Wikipedia 0.8

This article https://en.wikipedia.org/wiki/1760s is missing a large amount of information, most notably in 2021-03.

2021-03 English Wikipedia:

https://user-images.githubusercontent.com/71934042/126863828-3184543c-9398-4990-a16d-04c26872f47e.mp4

2021-07 English Wikipedia 0.8:

https://user-images.githubusercontent.com/71934042/126863850-a53d7a39-c554-462e-91b0-63d8bcf41389.mp4

Original English Wikipedia:

https://user-images.githubusercontent.com/71934042/126863925-1b20e8ec-ab97-4b5e-81da-f0de0b8c5c45.mp4

It looks like everything that follows past this error is not scraped. Both versions output a different variant.

2021-03 English Wikipedia:

Screenshot_20210724-111742_Samsung Internet

2021-07 English Wikipedia 0.8:

Screenshot_20210724-111806_Samsung Internet

Inbefortus avatar Jul 24 '21 09:07 Inbefortus

Quite sure an upstream problem, but should be clearly identified.

kelson42 avatar Aug 15 '21 16:08 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Nov 09 '21 20:11 stale[bot]

@kelson42 After quite a while, this bug is at last no more!

Screenshot_20220925-195415_Samsung Internet Beta

ZIM: https://farm.openzim.org/pipeline/2b9cc933987bbbb3c55ce236

Upstream: https://en.wikipedia.org/api/rest_v1/page/html/1760s

Inbefortus avatar Sep 25 '22 19:09 Inbefortus