zimit
zimit copied to clipboard
Zimit2: Youtube videos are not working everywhere
We have to fix the situation where Youtube videos are not working everywhere.
We typically now that they do not play in kiwix-serve on Android Firefox / Chrome (while they should) and it looks like they do not play on kiwix-serve on Windows as well: https://github.com/openzim/warc2zim/issues/206#issuecomment-2022247860
This is in fact a Zimit issue, and most probably has nothing to do with Zimit2. I'm transferring it to zimit repo and will give more explanations once transferred.
I've done some tests with zimit2 and warc2zim2 (url_handling
branch from PR https://github.com/openzim/warc2zim/pull/218 but we will see it does not matter).
Browsertrix crawler is hence 1.0.0 beta-6
I ran 4 different tests:
- A. crawling with default zimit settings: no
--mobileDevice
and zimit custom user agent-
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
- WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_uaz_2024-03-27.warc.gz
- ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_uaz_2024-03-27.zim
-
- B. crawling with a
--mobileDevice
and no user agent customization-
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --cwd /output/.tmppqvsfui5 --combineWARC
- WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_2024-03-27.warc.gz
- ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_2024-03-27.zim
-
- C. crawling with a
--mobileDevice
and zimit user agent customization-
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
- WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_uaz_2024-03-27.warc.gz
- ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_uaz_2024-03-27.zim
-
- D. crawling without a
--mobileDevice
but with a user-agent looking like a Pixel 2:-
crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
- WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_uap_2024-03-27.warc.gz
- ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_uap_2024-03-27.zim
-
Device / Reader | A | B | C | D |
---|---|---|---|---|
MacOS 12.7.4 - Kiwix reader opened in Firefox | ✅ | ✅ | ✅ | ✅ |
MacOS 12.7.4 - Kiwix native app (3.3.0 build 145) | ❌ | ✅ (very slow to load) | ❌ | ✅ (very slow to load) |
iPhone 13 (iOS 15) - Kiwix reader opened in Safari | ❌ | ✅ | ❌ | ✅ |
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox | ❌ | ✅ | ❌ | ✅ |
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox | ❌ | ✅ | ❌ | ✅ |
Even if testing more readers will be important, conclusion seems pretty clear.
Conclusion
For Youtube videos (at least), we must use another userAgent than the current one.
Previous work on https://github.com/openzim/zimit/pull/229 (where we switched by default to a mandatory UA and choose to use a "desktop-like" UA) was not totally a good idea. It helped solve some problems with Python check of the URL ... but caused other issues like this one.
Now that Python check of the URL is gone, we should probably rollback most of PR 229 changes:
- not make the UA mandatory anymore
- stop to concatenate
--userAgent
and--userAgentSuffix
in zimit code - pass again the --userAgentSuffix argument to browsertrix crawler
I also recommend to set a default --mobileDevice
, so that a proper userAgent is passed (concatenated with our default userAgentSuffix) since it seems mostly mandatory for proper zimit operation, and add support for a new --noMobileDevice
, which would not set the argument --mobileDevice
argument in browsertrix crawler CLI call (should someone want to not set use mobileDevice ... probably rare, but priceless to implement ... probably not needed to be exposed on Zimfarm)
Then comes the question of which default mobileDevice to choose. For tests I chose Pixel 2, full list is here: https://github.com/puppeteer/puppeteer/blob/b144935789315697254880015847b2b4d151d52b/packages/puppeteer-core/src/common/Device.ts ; smaller screen might lead to situations where we are served a small asset, which is more or less what we prefer to keep ZIM size small and work on all screen size. This was my logic when I chose Pixel 2 for tests.
Edit: fix the test table, second device was wrong
Nota: I've also checked, in all cases the video which is retrieved is identical (same size, same codecs, ...) ... so the "fix" induced by using a more appropriate user-agent is only linked to "other" contents, not to the video codec or stuff like that.
Solved by https://github.com/openzim/zimit/pull/292
Just to confirm that the solutions B and D both work in the PWA and the Browser Extension. Was version B the adopted solution?
Yes, solution B is currently in place in zimit2
branch
Yes, solution B is currently in place in zimit2 branch
To be more precise, by default, "Pixel 2" is used as mobile device. Zimit user is free to override this setting with --mobileDevice
(as before) or use --noMobileDevice
to remove the default and use no mobile device.