ArchiveBot icon indicating copy to clipboard operation
ArchiveBot copied to clipboard

ArchiveBot, an IRC bot for archiving websites

Results 138 ArchiveBot issues
Sort by recently updated
recently updated
newest added

`^https://discord\.com/assets/` 100s of urls that waste time and resources because they probably get captured on cursory crawls anyway. ![image](https://user-images.githubusercontent.com/289437/126982929-24cdab9e-b8bf-4118-927d-cb3b986dedb5.png)

As I understand it, jobs are currently started without concurrency or delay settings, and those are later set by the settings monitor. This means that a job always starts at...

enhancement
pipeline

travis-ci.org was discontinued earlier this month, so the tests no longer run. Rather than switching to another proprietary platform (like travis-ci.com or GitHub Actions) that will be changing again in...

When ArchiveBot hits a .swf file, it should decompile it and search for URLs in the ActionScript. This may be tricky to implement, but it would fix most problems that...

enhancement
pipeline
upstream

`DownloadUrlFile` does not verify that the server responded with an HTTP 200. This morning, there was an issue, which lead to lots of errors and occasional 502s. The latter were...

bug
pipeline

I've noticed that sometimes, URLs are not retried properly. The most recent example is job 172fw8g4egszevx4i56uu06cm. One of about 1700 such URLs on that job: ``` $ zstdgrep -F 'https://usc.gov.mm/?q=node/66'...

bug
investigation
pipeline

>NOTE: I am going to do it myself but since I forgot to bring my laptop today this is a reminder to do it later today when I can work...

enhancement
ignores

If it detects it is a mediawiki wiki, it should go to Special:AllPages Ex: https://apple.fandom.com/wiki/Special:AllPages

enhancement
pipeline

While global deduplication for everything in ArchiveBot is not feasible, we should consider adding something for certain URLs that waste a lot of disk space, probably shouldn't be ignored entirely,...

enhancement
backend
pipeline

Cf. #490 and #491 Environment variables on the preflight test are not modified, but inside the pipeline, only the selected ones specified in `wpull_env` in `pipeline.py` are passed to wpull....

bug
pipeline