zim-requests icon indicating copy to clipboard operation
zim-requests copied to clipboard

edu.gcfglobal.org ZIMs are all missing some content

Open benoit74 opened this issue 1 year ago • 7 comments

ZIM(s) location

https://library.kiwix.org/#lang=&q=gcf

Recipe(s) URL

https://farm.openzim.org/recipes?name=edu.gcfglobal.org

Readers tested

  • [ ] Kiwix-serve on iOS (iPad / iPhone)
  • [ ] Kiwix-serve on Android (phone or tablet)
  • [ ] Kiwix-serve on Windows
  • [X] Kiwix-serve on Linux
  • [ ] Kiwix-serve on Raspberry Pi (e.g. hotspot)
  • [ ] Kiwix-serve on Mac
  • [ ] pwa.kiwix.org
  • [ ] Kiwix JS - Chrome extension
  • [ ] Kiwix JS - Firefox extension
  • [ ] Kiwix JS - Edge extension
  • [ ] Kiwix for Android application
  • [ ] Kiwix for MacOS application
  • [ ] Kiwix for iOS (iPad/iPhone) application

Which ZIM versions are impacted?

All PROD versions are impacted

Details

Content inside courses are loaded lesson by lesson.

For instance, on https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/1/, you start with a "Continue" button at the bottom of the page. When you click on it, Lesson 2 is loaded and appears, including a new "Continue" button at the bottom of the page. And so on.

Currently, all content behind "Continue" buttons is not available in the ZIM (see https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06/edu.gcfglobal.org/en/beginning-a-new-career/how-to-decide-on-a-career-field/1/). Problem is that crawler had no idea that more content was hiding behind this "Continue" buttons.

Typical solution to solve this problem is to develop a custom behavior for Browsertrix Crawler which would fake clicks on these buttons so that the crawler fetches corresponding content.

Note for self: URL loaded by "Continue" button seems protected by a timestamp, e.g. https://edu.gcfglobal.org/en/beginning-a-new-career/transferring-your-skills-to-a-new-career/content/?_=1718454918606 ; fuzzy rule to remove this is most probably required.

@Popolechien @RavanJAltaie shall we keep the ZIM in production even if courses are incomplete? No-one complains, and Youtube video are present so it is not like we have nothing, but clearly it is incomplete. I wouldn't recommend to delete them since their limited content already provides some value + it might take some time until I develop the custom behavior.

benoit74 avatar Jun 15 '24 12:06 benoit74

Nota: it looks like it might also be broken to properly create a ZIM of this website, even Zimit1 is failing to retrieve Youtube videos in my last tests ...

benoit74 avatar Jun 17 '24 09:06 benoit74

UP? Do I delete production files which are significantly broken?

benoit74 avatar Jun 20 '24 08:06 benoit74

I'd say nobody complains because they're offline. But offering significantly broken zim files when their download is a cost is not appropriate.

Popolechien avatar Jun 20 '24 08:06 Popolechien

You're right, so I've deleted the ZIMs from the library, let's start over with these ZIMs creation and publish them once they are really ready:

  • Youtube videos are working
  • "Continue" buttons are working

benoit74 avatar Jun 20 '24 09:06 benoit74

@benoit74 @rgaudin Two hours have passed and the files are still in the library! Why?

kelson42 avatar Jun 20 '24 11:06 kelson42

library-refresh is broken. pokemonwiki_en_all_maxi was moved from other to zimit without opening a ticket, as is the procedure (and a known limitation of the tool). @benoit74 is probably better informed so he'll follow-up

rgaudin avatar Jun 20 '24 13:06 rgaudin

It's the opposite, a ticket was done where we decided to move the file from zimit to other, but I forgot to update the recipe so next ZIM created 3 days ago was pushed to ... zimit again 😫

Situation is fixed, library will refresh soon

benoit74 avatar Jun 20 '24 15:06 benoit74