pywb
pywb copied to clipboard
How to know the page source of records
trafficstars
I would like to know the original page source of a record. Is this possible? Let's say some google font is used on page https://example.com/about. A response or revisit record is in one of the warc files, however I don't see a way to know if it was used in page https://example.com/about or https://example.com/other.
One setting that I would like is if each page would get its own warc file, and the resources being used on the page are put in that same WARC file. This would work nice together with the dedup_policy: revisit or keep setting.
Note that the referer header is not sufficient especially with cross-domain resources.