TheTechRobo

Results 405 comments of TheTechRobo

@upintheairsheep That's already done - see #15; if you mean for Discord History Tracker, that's impossible as DHT doesn't save that data.

The `filefailed` line is now specifically `allow`ed.

That includes the regex for channel URL matching. Right now it's compiled every line of JSONL...

There are two options I can think of from here, unless something radically amazing comes up (a way to change an <a> or add one where it should be): 1....

Does the CDX API support wildcards in the hostname?

Looks like not directly, but it does support regex. I wonder if that can be used.

I don't think that's a good idea as WARCs may always be added to the Wayback Machine. We'd be missing those.

What do you think about filtering for all subdomains of sjl.youtube.com, i.e. https://web.archive.org/cdx/search/cdx?url=*.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey ? Edit: Ah, I see, you can't filter for all subdomains and a specific prefix simultaneously. :/

I'm assuming you mean have it show the results for each individual scraper the moment it's done rather than waiting for all of them? The trouble is, that would use...

Hmm, I'm thinking of something like this in the API code for streaming the API response: ``` for result in YouTubeService.run(id): yield result.json() ``` That would certainly work. Streaming in...