TheTechRobo
TheTechRobo
I've been looking at brozzler's codebase and I see a lot of direct writes to RethinkDB (e.g. https://github.com/internetarchive/brozzler/blob/master/brozzler/model.py#L119). That _would_ be fine, but RethinkDB has weird behaviour regarding errors: https://rethinkdb.com/api/python/insert/...
Example:  https://www.youtube.com/watch?v=jCQd6YqTnOk You should check if it already says "Premiered" and if so, don't add the "Published on".
Going to hold off for a few weeks so I can keep an eye on the experiment, but it appears videoinfo is more reliable than fakeurl. (videoinfo endpoint is `https://web.archive.org/__wb/videoinfo?vtype=youtube&vid=$videoid`)
The last character of a YouTube video ID can only be one of 16 characters. Until sometime in 2020, YouTube accepted video IDs where this last character was close enough,...
I'll probably make a reverse proxy on localhost. https://quart.palletsprojects.com/en/latest/how_to_guides/background_tasks.html
URLs like https://www.youtube.com/watch?v=SB_0vRnkeOk#action=share currently don't work because of the anchor. While this is fixable in the regex, something that would be way easier is actually parsing the URL. Not sure...
Previously, Chrome's headless mode was always turned on. This makes it optional when using Brozzler as a module. (If there's interest in a command-line option for this, I can add...
I was surprised to learn how brozzler's stealth JavaScript handles deviceMemory and hardwareConcurrency: https://github.com/internetarchive/brozzler/blob/a665d49bba2db8a4094d075e1e5c0803198745fc/brozzler/js-templates/stealth.js#L30-L35 Every call to either property returns a random value, so even simply `[navigator.hardwareConcurrency, navigator.hardwareConcurrency]` will have...
I notice in [`browse_page`](https://github.com/internetarchive/brozzler/blob/master/brozzler/browser.py#L468) there is a `behavior_dir` parameter which changes the directory that the behavior file is searched. However, I can't seem to find anywhere where this is exposed....