zimit icon indicating copy to clipboard operation
zimit copied to clipboard

Distinguish zim files that have not been crawled from external links

Open Popolechien opened this issue 1 year ago • 6 comments

Someone using the limited version of zimit at zimit.kiwix.org will be dealing with an incomplete file, where some links have not been crawled. Not being fully aware of the limitations or how zimit runs, clicking on an internal link that has not been crawled will give them a service unavailable or similar error page, which might be confusing (was the page not crawled, or did the crawl fail?).

It would be clearer, also in terms of CTA suggesting people reach out to purchase a full zim, to have another message along the lines of This page was not crawled because it exceeded the limits imposed on the free version. Please reach out to [email protected] to purchase a full version of this zim

Popolechien avatar Jan 13 '25 08:01 Popolechien

I have two issues to be discussed:

  • I'm not really convinced it is a good idea to have this displayed while browsing ; users of the ZIM are not necessarily the publisher of the ZIM, so they have no clue what a free version is, they have potentially no many to purchase something, ... We already have a CTA (or should have, since this is broken) in the mail the requester gets before downloading the ZIM
  • Technically speaking it is impossible (for now at least, i.e. it will impose not negligible developments) to know when a page is missing if it is linked to the crawl being incomplete or another problem (there might be tons of issues linking to a missing page in zimit)

benoit74 avatar Jan 13 '25 10:01 benoit74

Thanks, I guess the second point closes the issue (but I could understand we keep it open for later).

My point is that when someone offers to pay for a full version, we cannot really commit to a fully working zim, and if they're individuals they're unlikely to front the money for proper QA / fixes. The idea here was to say that we the zim we will provide has no guarantee but at least will crawl all pages listed as not crawled.

Popolechien avatar Jan 13 '25 10:01 Popolechien

This is why we have the Publisher field, and unless Publisher is openZIM, quality is by design unknown. And I think it is a matter of educating people to make a distinction. Just like it is now clear to most people that a non-working Excel spreadsheet is not the responsibility of Microsoft (in general at least, this could be debatable ^^).

Would love to gather more feedback on this issue from the rest of the team.

And regarding you point about quality, I don't really agreel. When someone offers to pay for a full version, we can commit to a given quality. It is only a matter of process: first make/pay an assessment, then we can propose (and commit) a price for a given quality - or even multiple prices for multiple qualities.

benoit74 avatar Jan 13 '25 12:01 benoit74

A product you purchase works, or it does not. I can't think of any item or service available for purchase that may gradually give the expected normal behaviour (without any added new option or feature) based on price tiers.

Popolechien avatar Jan 13 '25 14:01 Popolechien

Yes, I agree. And this is why I'm not strongly in favor of pushing your suggestion into the ZIM, ZIMs are supposed to either be OK or not be published and kept for personal usage.

By gradual improvements linked to various prices, I mean exactly what we have currently done on libretexts.org: we agreed that for a given price they will have fully working ZIMs but videos will not be present inside the ZIM. If they want the video, they need to pay an extra. Navigation inside the ZIM is also quite limited (no breadcrumbs, no move next/prev buttons, ...). Again, this is an extra. And there are probably other things which could be improved for an extra, but I already forgot about them.

benoit74 avatar Jan 13 '25 16:01 benoit74

Yes, but not sure this is something we can predict in advance.

Popolechien avatar Jan 13 '25 17:01 Popolechien