ipwb icon indicating copy to clipboard operation
ipwb copied to clipboard

Create a new endpoint to return a matched CDXJ record

Open ibnesayeed opened this issue 5 years ago • 8 comments

We need a new endpoint to return an index record instead of a reconstructed memento. This will enable us to try fetching IPFS blocks directly from the SW and reconstruct the memento there instead of letting the server do this. This will be a step in the direction of server-free decentralized replay. This will eliminate the need of threading as we can leverage asynchronous nature of JS for concurrent fetches. Additionally, we can avoid the location header rewriting issue (as per #456 and #461) by reusing the logic already present in Reconstructive.

/cdxj/:datetime/:urir should return 404 if no record is found, but return 200 (not a 3xx) otherwise with the one entry extracted from the index. We can either return application/cdxj+ors content type or application/json if we transform the index record into JSON.

ibnesayeed avatar Aug 07 '18 04:08 ibnesayeed

@ibnesayeed This seems like the goal you described for server-less replay it would make achieving #434 even more difficult. Please comment on this, as I am motivated of integrating Prefer support as is relevant to my (our) research.

machawk1 avatar Aug 07 '18 14:08 machawk1

No, the two are independent things. We can continue implementing support for Prefer header for raw memento. However, with client-side IPFS fetch, we will not need either of the raw or rewritten mementos as we will be performing the composition on the client side directly. Those accessing the server without the SW in place would need to talk to the regular memento endpoint.

ibnesayeed avatar Aug 07 '18 16:08 ibnesayeed

@ibnesayeed Do you think the CDXJ meta headers should be included in the response?

I am working on an implementation that leverages our currently existing functionality and want to be sure I route through the right functions so as to not have to duplicate functionality.

machawk1 avatar Aug 07 '18 18:08 machawk1

Do you think the CDXJ meta headers should be included in the response?

We can think about that later when we start consuming the response. We might just create a JSON object that has all the necessary bits from the matched record and any other necessary metadata in it.

ibnesayeed avatar Aug 07 '18 20:08 ibnesayeed

...just create a JSON object...

Per our verbal discussion, please outline how you expect this JSON object to look, e.g., including all the Memento-esque relations. Just an example ought to get us moving in the right direction to make this endpoint more useable for the replay banner.

machawk1 avatar Aug 07 '18 23:08 machawk1

@ibnesayeed Please document here the alternative Prefer semantics you described to me verbally in lieu of having a CDXJ endpoint.

machawk1 avatar Aug 08 '18 02:08 machawk1

I think we are looking for something like this:

$ curl -i -H "Prefer: return=minimal" "http://localhost:5000/memento/20140115101500/memento.us/"
HTTP/1.0 200 
Preference-Applied: return=minimal
Content-Type: application/json
Memento-Datetime: Wed, 15 Jan 2014 10:15:00 GMT
Link: <http://memento.us/>; rel="original",
 <http://localhost:5000/timemap/link/memento.us/>; rel="timemap"; type="application/link-format",
 <http://localhost:5000/timemap/cdxj/memento.us/>; rel="timemap"; type="application/cdxj+ors",
 <http://localhost:5000/timegate/memento.us/>; rel="timegate",
 <http://localhost:5000/memento/20130202100000/memento.us/>; rel="first memento"; datetime="Sat, 02 Feb 2013 10:00:00 GMT",
 <http://localhost:5000/memento/20140114100000/memento.us/>; rel="prev memento"; datetime="Tue, 14 Jan 2014 10:00:00 GMT",
 <http://localhost:5000/memento/20140115101500/memento.us/>; rel="memento"; datetime="Wed, 15 Jan 2014 10:15:00 GMT",
 <http://localhost:5000/memento/20161231110000/memento.us/>; rel="next memento"; datetime="Sat, 31 Dec 2016 11:00:00 GMT",
 <http://localhost:5000/memento/20161231110001/memento.us/>; rel="last memento"; datetime="Sat, 31 Dec 2016 11:00:01 GMT"
Server: InterPlanetary Wayback Replay/0.2018.08.08.0200
Date: Wed, 08 Aug 2018 21:39:39 GMT
Content-Length: 272

{
  "surt": "us,memento)/",
  "datetime": "20140115101500",
  "locator": "urn:ipfs/QmbyEELu2DNagj4bvdxCb4N7XHeSQEupbEugXTqnQ6QBGE/QmXDsUhfSzvtTwakyt6McXnjpzAw2BQvAcVdSCWSp2Tfge",
  "original_uri": "http://memento.us/",
  "mime_type": "text/html",
  "status_code": "200"
}

ibnesayeed avatar Aug 08 '18 21:08 ibnesayeed

While Prefer: return=minimal might work here, but as per the specs there is no guarantee about what is expected from the server when a minimal representation is returned. Hence, if we want a more tight semantics defined here about what the client is expecting then we can use a custom preferences here such as Prefer: memento-variant=index (as discussed in a potential RFC extension discussion).

ibnesayeed avatar Aug 08 '18 22:08 ibnesayeed