New ZIM request: NHS conditions
- Website URL: https://www.nhs.uk/conditions/
- License: OGL (need second check : 3.4 https://www.nhs.uk/our-policies/terms-and-conditions/)
- Desired ZIM Title: Health A to Z
- Desired ZIM Description: Your complete guide to conditions, symptoms and treatments, including what to do and when to get help
- Desired ZIM Icon –png (URL or attach one): https://www.nhs.uk/static/nhsuk/img/favicons/favicon-192x192.43924bfe6c7e.png
- Language (ISO 639-3): eng
- Desired Main Page (homepage, if different from website URL): https://www.nhs.uk/conditions/ (not all the website, only the descending pages)
- Is this a MediaWiki?: no
Excellent idea. It looks like a pretty straightforward design, have you tried it on zimit?
running it through youzim.it seems to do a great job :clap: (maybe just hitting the 1000 file limit).
Recipe created https://farm.openzim.org/recipes/nhs.uk-conditions_en_all I'll update the library link once ready
File is ready at the library https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-08
Same CSS fix should be applied as in https://github.com/openzim/zim-requests/issues/1138
Custom CSS created, recipe updated to publish to dev with this custom CSS and requested, let's see.
@benoit74 I've just noticed with this ZIM (I'm testing for the first time, having been away) that none of the videos appear to work. See for example the Heart Attack video at bottom of this page: https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-09/www.nhs.uk/conditions/heart-attack/ . There are other examples such as the Menstrual Cycle video at the bottom of this page: https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-09/www.nhs.uk/conditions/periods/ .
Clearly this is Zimit-related, and not specific to this ZIM, but I thought I should note it here.
EDIT: I tested in library.kiwix.org and in the PWA. Videos don't play in either.
I'm testing for the first time, having been away
For the record, you published this file to production on August 15, you probably already tested it or at least you should have.
The fact that videos don't work is is a known limitation of the scraper. Only Youtube videos are known to work in Zimit/Warc2zim, and this is not going to change in the coming months / years.
Is it critical enough that we remove the ZIM for production? Or the information present is sufficiently valuable without videos?
I'm testing for the first time, having been away
For the record, you published this file to production on August 15, you probably already tested it or at least you should have.
The fact that videos don't work is is a known limitation of the scraper. Only Youtube videos are known to work in Zimit/Warc2zim, and this is not going to change in the coming months / years.
Is it critical enough that we remove the ZIM for production? Or the information present is sufficiently valuable without videos?
Hi @benoit74 I think you think you're replying to a different person! (I am not involved in publishing ZIMs.). The decision on whether it's critical is more for your team to decide, but personally I'd say it's not critical because there is a lot of textual information. I don't know whether the underlying video files have been scraped, but if they have, then it bloats the ZIM if they can't be accessed, and it might be an idea to exclude them.
Sorry @Jaifroid, too soon in the morning, I was convinced it was Ravan speaking ^^
Your point regarding whether videos are bloating the ZIM is indeed a good one
I confirm the ZIM is bloated with first seconds of every videos. Unfortunately I don't think we have sufficient tooling to exclude them from the ZIM, AFAIK we can do it only with https://github.com/openzim/zimit/issues/353.
I think it would be super cool if we could also replace or even watermark video posters in such situation so that we have something saying "videos not available in ZIM". I've opened https://github.com/openzim/warc2zim/issues/396 to keep the idea.
I've also opened https://github.com/openzim/warc2zim/issues/397 for a "let's dream a bit" scenario.
Regarding current NHS conditions ZIM and until these issues are solved, should we manually remove the useless items and publish it manually? It is work only a developer can do, but if we agree that we will not update the ZIM for coming year this might be worth it to avoid big ZIM for nothing.
I was going to ask how much bloated is bloated but considering that NHS conditions is 4.5GB and NHS medicine is 13.5MB, I suspect I have an answer. @benoit74 can you please remove these unviewable videos?
can you please remove these unviewable videos?
Do we agree this is a one-shot manual operation, and I will not do it again until many months (i.e. the recipe will be disable?)
We have no tooling for this, so I will have to do it "by hand", quite time consuming.
@benoit74 Personally (but I guess it's @Popolechien's call), I'd say it is not something you should have to do "by hand", but rather something that could wait till https://github.com/openzim/zimit/issues/353 is ready and it can be done automatically. I don't think it's so urgent as to take up valuable time that could be spent on other things. Sorry if I'm speaking (writing) out of turn! JMHO.
We have no tooling for this, so I will have to do it "by hand"
Ah no, I thought that your hands would be writing a handy script and voilà. Never mind, then. Let's wait for openzim/zimit/issues/353 as flagged by @Jaifroid
Then we have to remove the file from production, right? If so, then please open a separate issue since the assignees are different.
Yup. Opened #1163
Wait - what's the policy again here? Keep it open as it's not ready, or close it because the recipe exists?
Never close unless we know we will never make the ZIM. Here we have good hopes to do the ZIM, so only flag it as upstream + bug.