Tessa Walsh
Tessa Walsh
Other thing to consider: Should we just expect the full filename with the extension to the `--log` argument? And if there's an issue with another tool that's using this library...
Thank you @ato , this is very helpful! Documenting the quirks especially - I'll have to look into the Python < 3.7 issues!
We will be modifying pywb and warcio.js to be consistent - necessary issues have been opened. Notably, we'll be making pywb use native JSON values rather than Pythonic values (e.g....
@ato PR is in and would love your eyes on it if you have the bandwith: https://github.com/webrecorder/specs/pull/149
Note that Puppeteer and Playwright use the CDP values but lowercased: https://playwright.dev/docs/api/class-request#:~:text=resourceType%E2%80%8B&text=ResourceType%20will%20be%20one%20of,%2C%20websocket%20%2C%20manifest%20%2C%20other%20
Playwright mapping for Firefox: https://github.com/microsoft/playwright/blob/73ffaf65d75b2378168ac5a11eb37cced03ff6ea/packages/playwright-core/src/server/firefox/ffNetworkManager.ts#L161
We've added this to our WARCs in response to a user-submitted issue: https://github.com/webrecorder/browsertrix-crawler/issues/451, with the primary use case being differentiating between resources fetched by JavaScript (via fetch, xhr) versus resources...
Updating with Webrecorder's current practices for screenshots: ## Crawl-time rendering artefacts | WARC-Type | Content-Type | WARC-Target-URI | Tool | | ------------ | -------------- | ------------------ | -----| | `resource`...
Hi @Stuartacus! It looks to me like the `./db_create.py` probably didn't run successfully - can you try again and share whatever output or errors you get here?
Hi @Twi-Hard, have you looked at this part of the pywb documentation? https://pywb.readthedocs.io/en/latest/manual/warcserver.html?highlight=fallback#sequential-fallback-collections Let us know if something is unclear or missing!