parentURL / sourceURL / Referer - flag ? to enable parentURL recording

Open Dooriin opened this issue 2 years ago • 5 comments

Would be really great if there could be a flag that enables parent url to be also recorded into a file with crawled urls.

So we can know where did the crawled page came from

Nov 09 '23 00:11 Dooriin

Do you mean per-URL or per-page, eg. adding which page was discovered from where in pages.jsonl, or for each URL which page it is part of? Is adding it to logs sufficient, or are you looking to see this in the WACZ/WARC data?

Nov 16 '23 16:11 ikreymer