ArchiveBot icon indicating copy to clipboard operation
ArchiveBot copied to clipboard

Jobs using a URL list should also archive the list itself with wpull

Open JustAnotherArchivist opened this issue 6 years ago • 1 comments

When running an !ao < (or !a <) job, the URL list itself is downloaded and (usually*) preserved as a *-urls.txt file. I think it would be nice if the list was also downloaded with wpull into a WARC and therefore made accessible in the Wayback Machine. That would make it much easier to access the list than having to search for the file in the ArchiveBot collection (if it's there at all).

(* There's no *-urls.txt file when the pipeline crashes while the job is running and the operator doesn't upload it manually.)

JustAnotherArchivist avatar Mar 12 '19 10:03 JustAnotherArchivist

An elegant solution that would simultaneously also get rid of issues like #353, possibly #338, and #207 would be to simply remove the pipeline download entirely and instead do it with wpull but without --delete-after.

JustAnotherArchivist avatar May 07 '19 02:05 JustAnotherArchivist