zimit icon indicating copy to clipboard operation
zimit copied to clipboard

Zimit2: Youtube videos are not working everywhere

Open benoit74 opened this issue 3 months ago • 4 comments

We have to fix the situation where Youtube videos are not working everywhere.

We typically now that they do not play in kiwix-serve on Android Firefox / Chrome (while they should) and it looks like they do not play on kiwix-serve on Windows as well: https://github.com/openzim/warc2zim/issues/206#issuecomment-2022247860

benoit74 avatar Mar 27 '24 09:03 benoit74

This is in fact a Zimit issue, and most probably has nothing to do with Zimit2. I'm transferring it to zimit repo and will give more explanations once transferred.

benoit74 avatar Mar 27 '24 10:03 benoit74

I've done some tests with zimit2 and warc2zim2 (url_handling branch from PR https://github.com/openzim/warc2zim/pull/218 but we will see it does not matter).

Browsertrix crawler is hence 1.0.0 beta-6

I ran 4 different tests:

  • A. crawling with default zimit settings: no --mobileDevice and zimit custom user agent
    • crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
    • WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_uaz_2024-03-27.warc.gz
    • ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_uaz_2024-03-27.zim
  • B. crawling with a --mobileDevice and no user agent customization
    • crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --cwd /output/.tmppqvsfui5 --combineWARC
    • WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_2024-03-27.warc.gz
    • ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_2024-03-27.zim
  • C. crawling with a --mobileDevice and zimit user agent customization
    • crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --mobileDevice "Pixel 2" --userAgent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
    • WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_uaz_2024-03-27.warc.gz
    • ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_pixel2_uaz_2024-03-27.zim
  • D. crawling without a --mobileDevice but with a user-agent looking like a Pixel 2:
    • crawl --failOnFailedSeed --waitUntil load --behaviors "autoplay,autofetch,siteSpecific" --url "https://tmp.kiwix.org/ci/test-website/youtube.html" --userAgent "Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36 +Zimit [email protected]" --cwd /output/.tmppqvsfui5 --combineWARC
    • WARC is at https://tmp.kiwix.org/ci/test-youtube/youtube_uap_2024-03-27.warc.gz
    • ZIM is at https://tmp.kiwix.org/ci/test-youtube/youtube_uap_2024-03-27.zim
Device / Reader A B C D
MacOS 12.7.4 - Kiwix reader opened in Firefox
MacOS 12.7.4 - Kiwix native app (3.3.0 build 145) ✅ (very slow to load) ✅ (very slow to load)
iPhone 13 (iOS 15) - Kiwix reader opened in Safari
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox
Fairphone 4 5G (Android 13) - Kiwix reader opened in Firefox

Even if testing more readers will be important, conclusion seems pretty clear.

Conclusion

For Youtube videos (at least), we must use another userAgent than the current one.

Previous work on https://github.com/openzim/zimit/pull/229 (where we switched by default to a mandatory UA and choose to use a "desktop-like" UA) was not totally a good idea. It helped solve some problems with Python check of the URL ... but caused other issues like this one.

Now that Python check of the URL is gone, we should probably rollback most of PR 229 changes:

  • not make the UA mandatory anymore
  • stop to concatenate --userAgent and --userAgentSuffix in zimit code
  • pass again the --userAgentSuffix argument to browsertrix crawler

I also recommend to set a default --mobileDevice, so that a proper userAgent is passed (concatenated with our default userAgentSuffix) since it seems mostly mandatory for proper zimit operation, and add support for a new --noMobileDevice, which would not set the argument --mobileDevice argument in browsertrix crawler CLI call (should someone want to not set use mobileDevice ... probably rare, but priceless to implement ... probably not needed to be exposed on Zimfarm)

Then comes the question of which default mobileDevice to choose. For tests I chose Pixel 2, full list is here: https://github.com/puppeteer/puppeteer/blob/b144935789315697254880015847b2b4d151d52b/packages/puppeteer-core/src/common/Device.ts ; smaller screen might lead to situations where we are served a small asset, which is more or less what we prefer to keep ZIM size small and work on all screen size. This was my logic when I chose Pixel 2 for tests.

benoit74 avatar Mar 27 '24 12:03 benoit74

Edit: fix the test table, second device was wrong

benoit74 avatar Mar 27 '24 12:03 benoit74

Nota: I've also checked, in all cases the video which is retrieved is identical (same size, same codecs, ...) ... so the "fix" induced by using a more appropriate user-agent is only linked to "other" contents, not to the video codec or stuff like that.

benoit74 avatar Mar 27 '24 15:03 benoit74

Solved by https://github.com/openzim/zimit/pull/292

benoit74 avatar Apr 05 '24 07:04 benoit74

Just to confirm that the solutions B and D both work in the PWA and the Browser Extension. Was version B the adopted solution?

Jaifroid avatar Apr 16 '24 14:04 Jaifroid

Yes, solution B is currently in place in zimit2 branch

benoit74 avatar Apr 16 '24 15:04 benoit74

Yes, solution B is currently in place in zimit2 branch

To be more precise, by default, "Pixel 2" is used as mobile device. Zimit user is free to override this setting with --mobileDevice (as before) or use --noMobileDevice to remove the default and use no mobile device.

benoit74 avatar Apr 16 '24 15:04 benoit74