ArchiveBot
ArchiveBot copied to clipboard
ArchiveBot, an IRC bot for archiving websites
and should produce the same result as .
``` yipdw: perhaps make the bot also give the final total filesize in the "has finished" message? so that you can tell whether it was successful ```
The ArchiveBot logging and job settings code has caused jobs to stall and prematurely abort a few times now. It should be moved out of the main wpull process and...
PressStar on #archivebot gave us these ignore patterns: ``` tag.*\.js (user|board|blog)-id.*\.js (\w+)-p.*\.js (enable|disable)autocomplete (notifymoderator|reply|login|registration)page searchform(v1|\.form\.form)* kudosbuttonv2 nospellcheck _change_me_ authentication/contributions/actions lightboxrendercomponent renderinlineeditform surveylauncher:getsurveyurl facebookconnectbuttonsecondary ``` I've asked for path prefixes where...
This could be an issue in wpull and/or in the way ArchiveBot uses the wpull size hooks.
Trying to archive a twitch VOD when I encountered the above error, dashboard output as follows: https://gist.githubusercontent.com/infrequent/ebdc3e5881928c14114a/raw/85c77eb5a41be0493c7def6526ed6662693b5728/gistfile1.txt issued command on IRC: !ao http://www.twitch.tv/shaboozey/v/12288654 --youtube-dl job id: 7p1afhyxcdn0zx79oo8er5uti
If a new job is created and there are multiple open pipelines, ping them all and give the job to the one with the lowest ping time. (Props to MrRadar...
``` [14:04:28] the !ig command should probably validate the regex before adding it [14:04:34] DFJustin: good point [14:04:43] it should, but the bot's in Ruby and the pipeline's in python...
ArchiveBot needs a new command, something like... `!whatareyouworkingon ` or `!pipelinestatus ` ...that gives back some basic pipeline stats, number of jobs it can potentially handle, the number of jobs...
It would be nice to have a way to see the contents of ignoresets without having to look them up in the source code.