MemGator
MemGator copied to clipboard
Question about, and suggested links for archives.json
Is it anticipated that all memento compatible web archives will be listed in the archives.json file?
I had a look at a list I maintained today and I believe these all offer the "timegate" component of Memento, I am not sure how to access their time maps (perhaps that means they are not compatible?):
- Library of Catalonia: https://wayback.padicat.cat/wayback
- Croatian Web Archive: https://haw.nsk.hr/wayback
- Croatian Web Archive (English Language): https://haw.nsk.hr/en
- Estonian Web Archive: https://veebiarhiiv.digar.ee/a/
- National and University Slovenian Library: https://arhiv.nuk.uni-lj.si/wayback/
- York University Ontario: https://digital.library.yorku.ca/wayback/
NB. below would be a correction to LAC Canada:
- Library and Archives Canada: https://webarchiveweb.wayback.bac-lac.canada.ca/
Connected to: https://github.com/oduwsdl/MemGator/issues/139
@ross-spencer The archives.json is manually maintained. /cc @ibnesayeed
The TimeGate for a URI-R (original URI a la live web URI) can be determined when accessing a memento. For instance, the Library of Catalonia has a capture of https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ at https://wayback.padicat.cat/wayback/20230217071002/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ .
If you look to the HTTP response headers when accessing https://wayback.padicat.cat/wayback/20230217071002/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ , you will see a Link
header (RFC5988/8288) with a "value" containing...
...
, <https://wayback.padicat.cat/wayback/timemap/link/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/>; rel="timemap"; type="application/link-format", <https://wayback.padicat.cat/wayback/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/>; rel="timegate",
...
Commas are the Link field delimiters here. So, from this header, we can identify a URI-T (URI of a TimeMap) at https://wayback.padicat.cat/wayback/timemap/link/https://canalsalut.gencat.cat/ca/salut-a-z/c/covid-19/ , not based on the extracting semantics from the URI but in that the following rel
value of timemap
provides the semantics for this URI.
I checked some of the others, which provide a similar discovery method.
As an aside, Memento compatibility can be determined by the online Memento validator and a Python module for local testing. See this writeup from JCDL '22 for more info.