mwoffliner
mwoffliner copied to clipboard
Infoboxes are missing in many articles
Kiwix Version: 3.4.4 ZIM: 2021-03 German Wikipedia
Additional information on the right side is not scraped and thus being lacking.
Kiwix:
https://user-images.githubusercontent.com/71934042/124442750-57493c80-dd7d-11eb-9e77-744b803816c7.mp4
https://user-images.githubusercontent.com/71934042/124442760-57e1d300-dd7d-11eb-9d00-7abd081b6c64.mp4
Original German Wikipedia:
@Inbefortus I see this content in wikipedia_de_all_maxi_2020-06.zim
when rendered in Kiwix JS PWA. So it's either an issue with the latest ZIM, or it's an issue with the reader. If you try your ZIM with pwa.kiwix.org (just visit that page in a browser and pick your file), let us know if the content is still missing. I suggest you turn on the destktop style (in Configuration) for the closest rendering to what you're seeing above, though I still see the content in mobile style too on the earlier ZIM, just not so well formatted.
@Inbefortus I see this content in
wikipedia_de_all_maxi_2020-06.zim
when rendered in Kiwix JS PWA. So it's either an issue with the latest ZIM, or it's an issue with the reader. If you try your ZIM with pwa.kiwix.org (just visit that page in a browser and pick your file), let us know if the content is still missing. I suggest you turn on the destktop style (in Configuration) for the closest rendering to what you're seeing above, though I still see the content in mobile style too on the earlier ZIM, just not so well formatted.
It's conclusively an issue with the ZIM file. I remain the equal outcome:
You should compare what is comparable. I mean the mobile output, here are screenshots from the Desktop output.
Original German Wikipedia (mobile version):
Kiwix JS PWA (mobile version):
OK, so I concur that the problem is that ZIM version, possibly some change in MWOffliner or Parsoid between 06/2020 and 03/2021. Image below, for reference, is the desktop version of the 06/2020 German ZIM in Kiwix JS PWA.
I have verified and this is like I said: the mobile API does not deliver the infobox: https://de.wikipedia.org/api/rest_v1/page/mobile-sections/Schlacht_bei_Wavre
@Inbefortus The screenshot you provide is taken with the mobile view, with your dekstop browser, and indeed has the infobox. No clue how this is built, but this is not using the mobile API (like MWoffliner).
The Wikipedia Android App uses the mobile API and does not provide the infobox as well.
We can regret that this is not provided by the mobile API of Wikipedia, but this is not our decision. We can as well regret that MWoffliner does not scrape from the Desktop API, but we already have decided a few years ago that we would focus on mobile as we don't have the resources to provide both (dekstop+mobile). @Jaifroid That said, I wonder that in 2020/06 we were still doing based on Desktop... but that does not change much about what I said earlier.
So far MWoffliner works as intended. Closing the ticket.
@kelson42 2020-06 is scraped from the Mobile API (the entire ZIM has mobile styles, the desktop views I showed were merely the application of a desktop style by the reader). For whatever reason, the API stopped providing the infoboxes between 2020-06 and today, at least for some ZIMs and some pages. There are definitely infoboxes in other current Wikimedia ZIMs (well, I haven't downloaded new ones for about a month). Maybe there is something special about these particular infoboxes? Sounds like a bug in Parsoid if just these infoboxes are missing from the API...
Do we have a similar infobox which is included?
@kelson42 It depends what you mean by "similar". There are Infoboxes on almost every article of the most recent Wikipedia-based ZIM I have, which is wikipedia_en_medicine-app_maxi_2021-06.zim
. An example in the fist screenshot below.
The German Wikipedia 2020-06 has the "Waterloo" infobox, but it is not identified by class as an infobox. Maybe this is part of the issue with the 2021-03 ZIM, if the API has recently been updated to select infoboxes by class... See screenshot bottom from the 2020-06 ZIM.
@Jaifroid @Inbefortus After thinking twice about that, I don't want to challenge the Wikimedia team about that. To me this is not an obvious bug, even if I personaly would prefer to have always the infobox. I don't really want to start any discussion about this should be in or this should be out when the problem is not obvious. Therefore, either you open an upstream ticket yourself (would be interested to follow it) and we link it to this ticket (and keep this ticket open) or I will close this ticket (because there nothing more which can be done at my level).
@kelson42 I think I'd need to be sure that this is a consistent upstream error with specific infobox types before filing it as a Parsoid (?) bug. There are infoboxes all over Wikipedia that are perfectly well represented in Kiwix ZIMs with mobile style, yet there are some that are missing for no apparent reason. It's not that all infoboxes are missing by any means. So either it's a random bug, or there are specific infoboxes that are being accidentally omitted by the mobile Parsoid API, even though they are somehow shown in the mobile view of Wikipedia online. We need more info, especially with the latest scrapes, before being able to claim this is an API bug.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@kelson42 @Jaifroid I just realized that in the latest German Wikipedia (2022-06), infoboxes are also now missing in all articles about films/series.
One truly wonders if this is perhaps a possible bug or intentional? If this continues over time and at some point all infoboxes in articles about people, countries, animals, cities, etc. are no longer accessible, a lot of important information would be lost.
However, to settle this once and for all, I will be creating an upstream ticket tomorrow, so stay tuned!
@kelson42 @Jaifroid Here it is:
- https://phabricator.wikimedia.org/T311817
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@Inbefortus and @Jaifroid : There is a similar issue with the German Wiktionary and its .ZIM file.
I filled a bug upstream, and both issues might be related: https://phabricator.wikimedia.org/T319303
If you have any ideas, please follow that thread too. Thanks !
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.