browsertrix-crawler
browsertrix-crawler copied to clipboard
parentURL / sourceURL / Referer - flag ? to enable parentURL recording
Would be really great if there could be a flag that enables parent url to be also recorded into a file with crawled urls.
So we can know where did the crawled page came from
Do you mean per-URL or per-page, eg. adding which page was discovered from where in pages.jsonl, or for each URL which page it is part of? Is adding it to logs sufficient, or are you looking to see this in the WACZ/WARC data?