browsertrix-crawler icon indicating copy to clipboard operation
browsertrix-crawler copied to clipboard

Disable browser updates

Open rgaudin opened this issue 2 years ago • 2 comments

We've noticed that should a run last long enough (at least consistent in 15mn ones), WARC includes data that we did not request: the google chrome update files.

Apparently, the browser phones home, realizes it's not up to date and automatically downloads update data… all of this happening inside the proxied environment.

In zimit, we chose to disable updates for now.

See https://github.com/openzim/zimit/issues/172

rgaudin avatar Mar 10 '23 12:03 rgaudin

@rgaudin Thanks for sharing! Yes, we definitely want to disable this. For reference, where did you find the way to do this? I see Chromium has a bunch of flags related to auto-update as well..

We are actually considering switching to Brave for crawling, which should not have the same issue.

ikreymer avatar Mar 14 '23 04:03 ikreymer

https://github.com/openzim/zimit/commit/6324b7c7c521c76e4e12e03b2fa01a44b10234c5

kelson42 avatar Mar 14 '23 05:03 kelson42

No longer an issue in 1.x, now switched to Brave, and no longer recording all traffic from browser, only from certain windows. Can revisit if other issues arise - I think Brave does download some updates while running that may/may not be possible to disable (can discuss more in #463)

ikreymer avatar Jun 15 '24 19:06 ikreymer