Tessa Walsh

Results 111 comments of Tessa Walsh

Yay! A bit of spitballing: * I think some sort of timeline/clustering visualization of the last modified dates could be interesting (although it's dependent on FITS, so won't be available...

Another idea that would also take some backend work: it could be interesting to try to visualize the relationships between the original files and their preservation derivatives - comparing formats...

I totally get that! Sounds good :)

Hi Ashley! I did a bit more layout work on this and think it's ready to go live as-is if you're keen! Of course you're always welcome to open new...

Thank you! I appreciate that you spent some of your rare free time on this, and the door will always be open if you wanna do any more! I'm going...

Hi @dbuenzli , thanks for these comments. In terms of the new fields, yes, perhaps we should create/propose an extension to the core WARC format with these new fields, and...

We are in the process of documenting these new headers and fields, tracking in https://github.com/webrecorder/browsertrix/issues/issue/1588

Improved logging merged in #195. Significant changes include: - Logs are output as json-l with proper log levels and contexts to support filtering - Page crawl graph data included -...

@despens it seems like the main outstanding issue from your comment is that getting TLDs from `pages.jsonl` can be difficult because of the presence of extracted full text, which seems...

Moved to Playwright in https://github.com/webrecorder/browsertrix-crawler/commit/82808d813321c6c5860a529414e20e2638887b31