zimit icon indicating copy to clipboard operation
zimit copied to clipboard

Zimit not scraping cyrillic text

Open Popolechien opened this issue 4 years ago • 7 comments

Target website was https://sattvinfo.net/ - a russian website on Satellite TV apparently. The zimit job is here and worked fine, except for the part where it did not pick up any of the cyrillic. Could it be they use non-standard font and zimit does not know how to fall back to default?

Screenshot_20210325_102222_org kiwix kiwixmobile

Popolechien avatar Mar 25 '21 09:03 Popolechien

Thank you, I opened an upstream ticket https://github.com/webrecorder/browsertrix-crawler/issues/36

rgaudin avatar Mar 25 '21 10:03 rgaudin

The fix is now part of wabac.js 2.6.8, updating to that version in warc2zim should fix the issue.

ikreymer avatar Mar 26 '21 16:03 ikreymer

@ikreymer great, thx!

kelson42 avatar Mar 26 '21 16:03 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Jun 02 '21 17:06 stale[bot]

@rgaudin can we close this?

kelson42 avatar Jun 02 '21 17:06 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Aug 03 '21 01:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Sep 21 '22 03:09 stale[bot]