website icon indicating copy to clipboard operation
website copied to clipboard

Confirm that Internet Archive push is working on every update

Open veganstraightedge opened this issue 4 years ago • 4 comments

Does this still work?

https://github.com/crimethinc/website/issues/451

Let's find out and either close this with no additional effort needed or let's fix the thing so that archive.org always has all of the site's articles.

veganstraightedge avatar Oct 24 '20 23:10 veganstraightedge

so, looking into this i think the api has just been going down. Now, rather than 500s, we are getting timeouts

I am having a hard time figuring out if this api is even supported anymore

all that said, I did find out the the Internet Archive has a way to just send an email full of links, and they will archive all of the URLs, and email you back the results: https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/

I wonder if it would be more stable to write an ActionMailer job to run nightly and batch process articles based on updated_at >= 1.day.ago

astronaut-wannabe avatar Nov 13 '20 02:11 astronaut-wannabe

I also found this internet archive browser extension that has a "Save Page Now" feature that is working, so maybe we can extract that code into ruby?

https://github.com/internetarchive/wayback-machine-webextension/blob/2b46d356f625e28ef98b376541edbe5f7203bbb4/webextension/scripts/background.js#L59-L116

astronaut-wannabe avatar Nov 13 '20 03:11 astronaut-wannabe

fyi, these are the docs for the Save Page Now v2 API: https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit#heading=h.1gmodju1d6p0

My buddy did an example implementation in python here: https://github.com/palewire/savepagenow/pull/31

bensheldon avatar Nov 16 '23 01:11 bensheldon

Thanks for the reference @bensheldon we'll give that a look and try to update our current implementation. 😀

just1602 avatar Nov 16 '23 05:11 just1602