mwoffliner icon indicating copy to clipboard operation
mwoffliner copied to clipboard

New mobile-html Wikimedia returns random empty responses

Open benoit74 opened this issue 1 year ago • 27 comments

Recipe: https://farm.openzim.org/recipes/wikipedia_arz_all

Error is:

[error] [2024-03-02T00:03:51.082Z] Error downloading article كريس_ستاندرينج
[error] [2024-03-02T00:03:51.086Z] Failed to run mwoffliner after [7351s]: {
	"name": "Error",
	"message": "Cannot render [] into an article"
}
[error] [2024-03-02T00:03:51.086Z] 

**********

Cannot render [] into an article

**********

Looks like the article is not empty online: https://arz.wikipedia.org/wiki/%D9%83%D8%B1%D9%8A%D8%B3_%D8%B3%D8%AA%D8%A7%D9%86%D8%AF%D8%B1%D9%8A%D9%86%D8%AC

benoit74 avatar Mar 04 '24 08:03 benoit74

wikipedia_id_all is impacted as well: https://github.com/openzim/zim-requests/issues/879

benoit74 avatar Mar 12 '24 14:03 benoit74

@audiodude This newly stops many Wikipedia to render properly. I believe this is not a regression with 1.14 but this impairs us seriously to move forward with testing of 1.14. Last run of WPAR is impacted: https://farm.openzim.org/pipeline/c8708ce9-f831-4c06-a9d6-748e6e860cec/debug

kelson42 avatar Jun 29 '24 06:06 kelson42

WPCA impacted as well https://farm.openzim.org/pipeline/1c29259f-d858-40f4-8cfb-530696e2b20f/debug

kelson42 avatar Jun 30 '24 09:06 kelson42

Although the error message is the same, I'm not sure this is the same bug.

For WPARZ, I cannot reproduce with an articleList of only كريس_ستاندرينج.

For WPCA, it is 100% reproducible with an articleList of Khalifa_ibn_Askar. However it is also the case that https://ca.wikipedia.org/api/rest_v1/page/mobile-html/Khalifa_ibn_Askar returns empty/missing data: https://gist.github.com/audiodude/139ad898a925733d56fd08fee5a5fb9f

WPID doesn't reproduce the bug when using an article list of IL-2_Sturmovik_(series). However it fails otherwise with the following stack trace: https://gist.github.com/audiodude/7743f8e6020c4dbe9c4f32301c7e5a6e

audiodude avatar Jun 30 '24 18:06 audiodude

Finally, realizing that WPAR is different from WPARZ, I tried the former and could not reproduce with articleList of توموت

audiodude avatar Jun 30 '24 19:06 audiodude

Hmmm, not sure what should be done next. In your log the line:

[warn] [2024-06-30T18:30:00.292Z] Couldn't find strings file for [id]

Seem suspicious.

kelson42 avatar Jul 01 '24 05:07 kelson42

Seem suspicious.

That's the new message added in #2050. Before, it would simply fail to find the id file, since there's no translation file for that language, and fall back silently to en. Now it logs a message whenever it can't find a required file.

audiodude avatar Jul 01 '24 15:07 audiodude

I get it, somehow this message is missing the keyword "language"...

kelson42 avatar Jul 15 '24 17:07 kelson42

Overall though, this issue is currently non-reproducible and seems due to some kind of upstream bug. Perhaps we should update the code to be more resilient to that. It's not clear what kind of phabricator ticket we could file other than "JSON endpoint sometimes returns empty response for non-empty articles" but without a demonstrable reproduction case.

audiodude avatar Jul 16 '24 00:07 audiodude