zim-requests icon indicating copy to clipboard operation
zim-requests copied to clipboard

New ZIM: Mankier.com

Open trappedinspacetime opened this issue 6 years ago • 15 comments

Please use the following format for a ZIM creation request…

  • Website URL: https://www.mankier.com/
  • License: Copyrighted | CC-by-sa | Public domain | ...
  • Desired ZIM Title: ManKier
  • Desired ZIM Description: Linux man pages
  • Desired ZIM Icon –png (URL or attach one): https://www.mankier.com/img/kier-sq.png
  • Language (ISO 639-3): eng
  • Desired Main Page (homepage): n/a
  • Is this a MediaWiki?: yes | no
  • Articles List URL (mediawiki): n/a

I am sorry I don't know if it's possible, mankier.com is a need for developers.

trappedinspacetime avatar Jun 24 '19 10:06 trappedinspacetime

This is possible and a good idea.

kelson42 avatar Apr 30 '20 07:04 kelson42

Requested https://farm.openzim.org/recipes/mankier

RavanJAltaie avatar Dec 03 '22 20:12 RavanJAltaie

Succeeded.

RavanJAltaie avatar Dec 08 '22 20:12 RavanJAltaie

@RavanJAltaie Thank you for your effort and the info. I checked out that ManKier_2022-12.zim. Unfortunately it's only 300KB file and it's not working.

trappedinspacetime avatar Dec 09 '22 08:12 trappedinspacetime

Yeah I confirm it only grabbed the first page: https://dev.library.kiwix.org/viewer#mankier_2022-12/A/www.mankier.com/

Popolechien avatar Dec 09 '22 09:12 Popolechien

This cannot work with Zimit, the website relies on a web API. I would tag this as "Scraper needed" at least, or decide we will never ZIM this (but the need since makes sense, so we should find an alternative).

I have some doubts regarding Licensing given the fact that code seems to be closed-source.

benoit74 avatar Jun 13 '24 13:06 benoit74

I've pinged the website owner to ask for clarification.

Popolechien avatar Jun 13 '24 13:06 Popolechien

We got permission (see https://kiwix.freshdesk.com/a/tickets/70652). Anything they could do to help?

Popolechien avatar Jun 14 '24 06:06 Popolechien

Super cool!

It is unfortunately not possible to use Zimit scraper because we do not have the ability to scrape the database and API service which are returning responses to search requests about a man page.

So I'm certain they can help if they want to. At least we can ask them how they would recommend to create an offline version of their website.

Would they be open to share the database with us so that we can write a custom scraper on-top of this database? Would they be open to share the source code of their website (rendering engine seems to be open-sourced, but not the rest of the website) so that can leverage this to build the scraper more quickly? Would they be open to contribute to this custom scraper effort: they can maybe easily adapt their website to become a "static-website" version which is not using any API or database, just plain (JSON) files, so that we can quickly create the scraper on-top of this static website?

Details could be discussed in a live meeting if they have interest in such a project and/or directly in this issue.

benoit74 avatar Jun 14 '24 07:06 benoit74

Hi Benoit,

There is an API and an underlying DB, used for the search and by some third parties... my assumption was you can ignore this if the goal is to package the content of the man pages which is static HTML, and exclude the search input box in Kiwix.

To get a list of all the pages I would suggest starting in the sections as I mentioned below. You can see how many pages there are per section: https://www.mankier.com/stats

Cheers, Jackson

Recipe reconfigured (I also altered a bit the title and description for more precision) and requested the task: https://farm.openzim.org/pipeline/d31651c5-0ffe-4492-a04b-3298a4c39980

benoit74 avatar Jun 17 '24 19:06 benoit74

Nota: excluding the search box is not straightforward with custom CSS, at least I failed to find proper CSS selector, let's live with it for a first version, we can fix that later if first ZIM is mostly OK

benoit74 avatar Jun 17 '24 19:06 benoit74

ZIM is ready and mostly OK: https://dev.library.kiwix.org/viewer#www.mankier.com_en_all_2024-06

There is just one big problem on https://dev.library.kiwix.org/viewer#www.mankier.com_en_all_2024-06/www.mankier.com/ page which is completely broken, I'll open an upstream issue

benoit74 avatar Jun 21 '24 06:06 benoit74

Nice. I couldn't find the problematic page you mentioned. How does one get there?

Popolechien avatar Jun 21 '24 08:06 Popolechien

Click on the "Home" link

benoit74 avatar Jun 21 '24 09:06 benoit74

Upstream issue has been fixed. New ZIM is ready in dev, I've added a custom CSS to hide adds which are not particularly appealing / relevant once offline. Please review and move to prod if OK for you.

benoit74 avatar Sep 14 '24 06:09 benoit74

@Popolechien can you please review dev file: https://dev.library.kiwix.org/#lang=&q=mankier

benoit74 avatar Nov 02 '24 16:11 benoit74

LGTM, ready for Prod.

Popolechien avatar Nov 03 '24 15:11 Popolechien

File published to prod: https://library.kiwix.org/#lang=eng&q=mankier

Recipe set to quartely update.

benoit74 avatar Nov 10 '24 07:11 benoit74