MemGator icon indicating copy to clipboard operation
MemGator copied to clipboard

Question about, and suggested links for archives.json

Open ross-spencer opened this issue 1 year ago • 1 comments

Is it anticipated that all memento compatible web archives will be listed in the archives.json file?

I had a look at a list I maintained today and I believe these all offer the "timegate" component of Memento, I am not sure how to access their time maps (perhaps that means they are not compatible?):

  • Library of Catalonia: https://wayback.padicat.cat/wayback
  • Croatian Web Archive: https://haw.nsk.hr/wayback
  • Croatian Web Archive (English Language): https://haw.nsk.hr/en
  • Estonian Web Archive: https://veebiarhiiv.digar.ee/a/
  • National and University Slovenian Library: https://arhiv.nuk.uni-lj.si/wayback/
  • York University Ontario: https://digital.library.yorku.ca/wayback/

NB. below would be a correction to LAC Canada:

  • Library and Archives Canada: https://webarchiveweb.wayback.bac-lac.canada.ca/

Connected to: https://github.com/oduwsdl/MemGator/issues/139

ross-spencer avatar Apr 24 '23 11:04 ross-spencer

@ross-spencer The archives.json is manually maintained. /cc @ibnesayeed

The TimeGate for a URI-R (original URI a la live web URI) can be determined when accessing a memento. For instance, the Library of Catalonia has a capture of https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ at https://wayback.padicat.cat/wayback/20230217071002/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ .

If you look to the HTTP response headers when accessing https://wayback.padicat.cat/wayback/20230217071002/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ , you will see a Link header (RFC5988/8288) with a "value" containing...

...
, <https://wayback.padicat.cat/wayback/timemap/link/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/>; rel="timemap"; type="application/link-format", <https://wayback.padicat.cat/wayback/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/>; rel="timegate", 
...

Commas are the Link field delimiters here. So, from this header, we can identify a URI-T (URI of a TimeMap) at https://wayback.padicat.cat/wayback/timemap/link/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ , not based on the extracting semantics from the URI but in that the following rel value of timemap provides the semantics for this URI.

I checked some of the others, which provide a similar discovery method.

As an aside, Memento compatibility can be determined by the online Memento validator and a Python module for local testing. See this writeup from JCDL '22 for more info.

machawk1 avatar Jul 17 '23 16:07 machawk1