browsertrix
browsertrix copied to clipboard
switch to async streaming download:
- download via presigned URLs via aiohttp instead of boto APIs
- use async methods from stream-zip to generate zip: note that stream-zip still does a sync->async conversion under the hood
- follow-up to #1933 for streaming download improvements
When the multi-WACZs being produced in this branch are loaded into ReplayWeb.page, no seed pages or resources are listed. There may be something slightly off, investigating further.
Should be fixed now! Turns out the datapackage.json was not quite valid, had incorrect path in resources, not returning equal to name, and matching properties to single WACZ!
Tested on dev and working well! Nice job