jsonld.js icon indicating copy to clipboard operation
jsonld.js copied to clipboard

Node.js document loader needs HTTP caching

Open ariutta opened this issue 9 years ago • 8 comments

The current code has the following message three times for the Node.js document loader:

TODO: disable cache until HTTP caching implemented:

I can't find an open issue for this. Can anyone give additional specs on what is desired here?

ariutta avatar Feb 01 '16 23:02 ariutta

I can't find an open issue for this.

Then this is the issue for it! :)

HTTP caching isn't implemented in the node.js document loader, ie: HTTP cache headers are not understood. In the mean time, a really hacky, in-memory, temporary document cache was put in place that just stored docs retrieved from a URL in the past few seconds, but it can cause headaches (because it's hacky), so it was disabled. The right solution is to require a module that will understand HTTP cache rules and set up some options to allow users to specify where the cache can be stored or similar.

So, ideally, one could create a node.js JSON-LD doc loader and configure it by giving it some kind of interface (or whatever config it needs) for storing the cache -- and then, internally, some open source module could be pulled in (I presume one exists for node) that will handle the HTTP caching rules and read/write from that cache. A small, in-memory cache could be used by default.

dlongley avatar Feb 02 '16 00:02 dlongley

What do you think about how the Netflix Falcor Model Cache works? It's in memory, can be assigned a maximum size and can be set to expire immediately, never, at an absolute time or at a relative time.

The big difference: it uses client-side caching instead of relying on HTTP caching, as described by Jafar Husain:

... if you want to build a cache for a single page web application, you're rarely leaning on HTTP caching because it's too course grained. If I make a sinlge request to the server usually in web applications, you're downloading a lot of different individual items which could change at different times rather than making a single HTTP request for one resource like we did, say, in the World Wide Web days

The link will take you to more of his answer to the question: "Hey, why are you caching inside of your application? Just rely on the HTTP cache..."

ariutta avatar Feb 02 '16 19:02 ariutta

Related comment from @dlongley on another issue:

There is also this: https://www.npmjs.com/package/cached-request

Which supposedly works with request -- which the node.js document loader is presently using.

cached-request sounds great for use as a drop-in replacement.

But one thing that might make the code easier to maintain would be using a request library that has the same API in Node and the browser. The request package can be browserified, but the result is very large (1.1MB minified). There's also this library that is much smaller (163KB minified): https://github.com/substack/hyperquest

The reason I was checking out superagent is that the browserified code is so small (13KB minified), and it uses the same API in Node and the browser.

ariutta avatar Feb 05 '16 19:02 ariutta

Well, honestly, I'm not that concerned about the maintenance costs for having two separate mechanisms in node vs. the browser. The browser has just about everything already built-in; we don't need to send anything at all if we just use what's there. That doesn't give us a common API, but we're only talking about a relatively small function that doesn't really change all that much.

dlongley avatar Feb 05 '16 19:02 dlongley

It's possible now to at least cache known context URLs in advance, using jsonld-document-loader:

const { documentLoaders, expand } = require('jsonld')
const { JsonLdDocumentLoader } = require('jsonld-document-loader')

// fetch and add the static context document to a document loader
const loader = new JsonLdDocumentLoader()
const context = await documentLoaders.node()('https://schema.org')
loader.addStatic('https://schema.org', JSON.parse(context.document))

// ...

// expand the doc using the document loader
const items = await expand(doc, {
  documentLoader: loader.documentLoader.bind(loader),
})

hubgit avatar Jun 17 '20 23:06 hubgit

Seems to me that fixing this issue would (most likely) also solve #420.

jonassmedegaard avatar Dec 15 '20 18:12 jonassmedegaard

Module got claims to be the only one among several to properly support caching: https://github.com/sindresorhus/got#comparison

Here's their guide to migrate from request: https://github.com/sindresorhus/got/blob/master/documentation/migration-guides.md

jonassmedegaard avatar Dec 15 '20 18:12 jonassmedegaard