zim-requests icon indicating copy to clipboard operation
zim-requests copied to clipboard

New request: radiozamaneh.com

Open benoit74 opened this issue 1 year ago • 17 comments

This is a subtask of #826 for tracking recipe progress one by one and avoid confusion.

  • Website URL: https://www.radiozamaneh.com/

Recipe already created here: https://farm.openzim.org/recipes/radiozamaneh.com_persian

benoit74 avatar Feb 19 '24 08:02 benoit74

The recipe failed, it produced only a 3.6MB ZIM.

Looking into the log, it looks like only the first page (homepage) loaded properly and all subsequent requests have been blocked, at least they all returned HTTP 400 error (Bad Request) while they are working online.

As mentioned upstream, the website is protected by Deflect.ca which seems to be prompt to block us.

@Popolechien @RavanJAltaie We should contact website owner (via our contacts) to check if it would be possible to have a whitelisting of our ondemand worker (public IPs are 92.243.27.71 and 2001:4b98:dc0:43:f816:3eff:fe32:84fc/64).

benoit74 avatar Feb 19 '24 08:02 benoit74

FYI, it looks like newest browsertrix crawler 1.0.0-beta.3 seems to be less impacted by the situation ; I wonder if we should update zimit2 image to use this new crawler even if running in beta

benoit74 avatar Feb 19 '24 10:02 benoit74

@benoit74 I'll discuss this with Stephane today.

RavanJAltaie avatar Feb 20 '24 08:02 RavanJAltaie

@RavanJAltaie @Popolechien Any feedback about @benoit74's request of whitelisting?

kelson42 avatar Mar 19 '24 07:03 kelson42

@kelson42 we decided to not whitelist for now, looks like it might not be needed with new browsertrix crawler 1.0, task is running since 10 days and almost complete

benoit74 avatar Mar 19 '24 07:03 benoit74

Just uploaded a new WARC which is supposed to be complete at https://tmp.kiwix.org/ci/test-warc/radiozamaneh.com_2024-05-14/radiozamaneh_20240514.tar

benoit74 avatar May 24 '24 09:05 benoit74

Custom CSS is ready at https://drive.farm.openzim.org/zimit_custom_css/www.radiozamaneh.com.css

benoit74 avatar May 24 '24 14:05 benoit74

WARC seems to be pretty good, conversion to ZIM found "only" 1866 unique broken links on www.radiozamaneh.com domain (and I checked few of them - most folllow same pattern) and they are all broken on source website as well

Could be either a rewriting error (HTML source code not properly interpreted, not likely, too few items from my PoV) or real issues in source website (more likely).

I'm currently running again the ZIM creation with custom CSS

benoit74 avatar May 24 '24 15:05 benoit74

New zimit2 ZIM is available at https://dev.library.kiwix.org/viewer#radiozamaneh-com_far_all_2024-05/www.radiozamaneh.com/ or searchable with https://dev.library.kiwix.org/#lang=&q=%D9%85%D8%B3%D8%AA%D9%82%D9%84

benoit74 avatar May 27 '24 11:05 benoit74

I just found two new issues:

  • Next / Previous buttons on sections are not working. E.g. at https://dev.library.kiwix.org/viewer#radiozamaneh-com_far_all_2024-05/www.radiozamaneh.com/section/culture/

image

  • Cookie banner is not removed (now fixed in custom CSS but ZIM not updated)

image

benoit74 avatar May 27 '24 11:05 benoit74

Ah, I was going to say that it looks pretty good to me.

Popolechien avatar May 27 '24 11:05 Popolechien

It is still pretty very good from my PoV ^^

benoit74 avatar May 27 '24 15:05 benoit74

A new ZIM is currently being built at https://farm.openzim.org/pipeline/f3908653-bff1-407f-95b0-4c2f698d3bd6 with latest scraper version and custom CSS, expecting to produce adequate ZIM from end-to-end this time

benoit74 avatar Jun 03 '24 13:06 benoit74

Looks like it succeeded to produce a good ZIM, @Popolechien please review and transfer to client if you are happy as well, or speak up about remaining issues needing a fix:

https://dev.library.kiwix.org/#lang=&q=%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4%DA%AF%D8%B1%DB%8C%D9%90

benoit74 avatar Jun 10 '24 07:06 benoit74

As mentioned in https://github.com/openzim/zimit/issues/339, (some) videos seems to not be working on Chrome browser

benoit74 avatar Jul 08 '24 05:07 benoit74

@Popolechien I begin to see errors in Zimfarm logs linked to Cloudflare blocking some requests. Can we contact the website owner to be whitelisted just like we did for iranwire?

benoit74 avatar Jul 18 '24 05:07 benoit74

Let me ask.

Popolechien avatar Jul 19 '24 08:07 Popolechien

ZIM is ready in dev library, moved to prod

benoit74 avatar Sep 10 '24 09:09 benoit74

ZIM is ready in PROD: https://library.kiwix.org/viewer#radiozamaneh-com_far_all_2024-09/

benoit74 avatar Sep 12 '24 08:09 benoit74