Feat/http cache
Implementation for https://github.com/comunica/comunica/issues/166
@rubensworks I've tried it again, and it still fails for 14.x and 16.x. You said before that this can be ignored. Is it time to check this in?
Thanks @jaxoncreed! I will look into this soon!
@brechtvdv Can you validate this and check whether this would be usable within the LDES client?
@jaxoncreed In the browser I believe you cannot read the cache-control/etag headers when doing HTTP requests: this is only a NodeJS contribution, right?
Yes, actor-http-cache would allow us to rewrite the getPage function (https://github.com/TREEcg/event-stream-client/blob/main/packages/actor-init-ldes-client/lib/EventStream.ts#L512). @jaxoncreed: does it matter whether its an HTTP or HTTPS request? One thing that we need is follow-redirects. This is probably not handled currently?
@pietercolpaert Yes, it would seem that this would only work in a NodeJS environment, so I guess you should have a separate configuration for an in-browser environment.
@brechtvdv No follow-redirects are not handled
@pietercolpaert Yes, it would seem that this would only work in a NodeJS environment, so I guess you should have a separate configuration for an in-browser environment.
Could you look into creating such a separate configuration? So that we don't break browser builds when we merge this.
I guess we can follow a similar approach as we do for @comunica/query-sparql-file, where we just have a browser config, and a node config that extends from it and includes the cache actor.
See https://github.com/comunica/comunica/tree/master/engines/config-query-sparql/config, https://github.com/comunica/comunica/blob/master/engines/query-sparql-file/package.json#L174, https://github.com/comunica/comunica/blob/master/engines/query-sparql-file/package.json#L178-L181.
@brechtvdv No follow-redirects are not handled
If the http-fetch actor is configured properly, then redirects will be followed correctly.
Note to self: before merging, we should do an in-depth manual check to see if everything still works correctly with this. For example, we should test and inspect TPF queries.
@rubensworks Okay, I think I've covered all the comments.
Hey @rubensworks sorry for the delay. I've now refactored the code. There's a few things to point out:
Firstly, remember that request.body is, in actuality, either a ReadableStream or a NodeJS.Readable when it comes in, depending on the environment. Later in the code, it is normalized to a NodeJS.Readable, but down in the HttpLayer it remains in the superposition of being one or the other. For this code to work, response.body requires methods that are exclusive to both ReadableStream (body.cancel, body.tee) and NodeJS.Readable (body.on). Would it make sense to normalize the body type to NodeJS.ReadableStream (ie in ActorHttpFetch)? Keeping it in this superposition requires a lot of forced casting and ts-ignores.
Secondly, Components.js is not able to recognize built-in web types like Buffer or ReadableStream. I've left these blank for now, but left a comment where they can't be used (https://github.com/jaxoncreed/comunica/blob/feat/http-cache/packages/actor-http-cache/lib/HttpCacheStorageStream.ts#L149)
Hey @rubensworks sorry for the delay. I've now refactored the code.
Super! Is this ready for review?
Would it make sense to normalize the body type to NodeJS.ReadableStream (ie in ActorHttpFetch)? Keeping it in this superposition requires a lot of forced casting and ts-ignores.
We should only make assumptions about the (non-Node) ReadableStream interface, so I would not do this.
Once we migrate away from node-fetch (probably only after Node < 18 LTS support is dropped), we'll be able to migrate fully to fetch interfaces, and the weird superposition of ReadableStream and NodeJS.Readable will be a thing of the past.
Secondly, Components.js is not able to recognize built-in web types like Buffer or ReadableStream
You can add these to https://github.com/comunica/comunica/blob/master/.componentsjs-generator-config.json
Fantastic, I've updated the comunica issue.
And yep, we're ready for a review.
@rubensworks I did a little experiment and I'm pulling my hair out. This will not work in the web browser. Ugh. I'll need to refactor everything and get back to you :( :( :( :( :( :( :(
Then again, does this need to work in the web browser? The web browser has its own cache, so would there be any instance were a browser-focused comunica would use this cache?
For example, I could just update the test function to throw an error if it's used in a web browser. Then, I could always just assume that the stream is a Node Stream. Does that work for you?
One thing to keep in mind is I'm unsure if React-Native environments would always have access to Node Streams.
Then again, does this need to work in the web browser? The web browser has its own cache, so would there be any instance were a browser-focused comunica would use this cache?
Yes, this would be important in cases where we want other actors to be notified of cache invalidations via the HTTP invalidation actions.
For example, I could just update the test function to throw an error if it's used in a web browser. Then, I could always just assume that the stream is a Node Stream. Does that work for you?
Given all the work that went into this PR already, I'm ok with scoping this as a Node-only actor.
However, just checking for browser presence in the test function will not work, as Webpack will already error at build-time.
Instead, we should create a dedicated Node and browser config file of the query-sparql engine, similar as how it's done in query-sparql-file, so that this actor is only included in Node builds.
I remember that you said at some point that you wanted to use this functionality in one of your libraries, which I assume is browser-side? Would this then still work for you?
Sorry about the frustration I expressed before. I had a goal to have this complete by the end of the week, and that wasn't hit.
Anyway, I looked back into getting things to work in the web browser, and I figured out how to do it, but many of the techniques used to clone streams and create new streams are completely different. (Here's the demo I made. I've tested to be sure it works in the browser https://gist.github.com/jaxoncreed/d6c8c4ec59e08b6bb8710ed34a12f41b)
So, from an architecture perspective, would you rather have separate classes that work in each environment then have those classes wired up in componentsJS, or keep the same classes but provide a bunch of conditionals depending on the environment?
@rubensworks which architecture would you like?
I'll get back to you in January!
@jaxoncreed If we can make it work in both Node and the browser, that would be best option! But as I've said before, given all the effort that went into this PR already, it would be sufficient to have only Node support for now.
So, from an architecture perspective, would you rather have separate classes that work in each environment then have those classes wired up in componentsJS, or keep the same classes but provide a bunch of conditionals depending on the environment?
If you think supporting both Node and the browser will be feasible, then I would recommend implementing everything in one package, and using the "browser" package.json field to override Node-specific classes with browser-specific classes. Similar to how the FetchInitPreprocessor is overridden for the browser in our fetch actor.
The trick here will be to make sure that logic that is different between Node and the browser is placed into one or more distinct file(s), and common code is shared.
To make sure that we test both variants of the code, we'll have to test them via system tests (which are run both on Node and on browsers in our CI).
Okay excellent. I'll get to work.
@rubensworks Do you know if there's an easy way to run tests in the web browser? Just to be sure everything's working properly.
@rubensworks Do you know if there's an easy way to run tests in the web browser? Just to be sure everything's working properly.
@jaxoncreed We recently started also running our system tests (that we already ran in Node) in browsers using Karma: https://github.com/comunica/comunica/blob/master/package.json#L88 For example, tests like these. So you could try writing such system tests as well (next to the specific actor unit tests, which I would run only in Node). The only caveat is that no jest-specific things can be used in this case, but I think this is not required here.
Hey @rubensworks I'm having a bit of trouble getting the system tests to work. Even on a clean install, they time out.
I'm cloning the repo, running yarn install, then running yarn test. Is there anything else I need to do?
I'm cloning the repo, running yarn install, then running yarn test. Is there anything else I need to do?
Nope, that's all you need to do.
But they did work before for you, or not?
System tests did work before. But after updating to the most recent version on master, they fail by timing out
System tests did work before. But after updating to the most recent version on master, they fail by timing out
That's weird. Nothing really changed though. Since the CI still works, and they don't time-out on my machine and on the machines from other people I know, I suspect it will be an issue on your end somehow.
Okay! It looks all the tests are passing except for any tests that involve solid-authn-js, and I have not yet found a solution for that.
The tests will have no problem with running the setup (https://github.com/jaxoncreed/comunica/blob/feat/http-cache/engines/query-sparql/test/QuerySparql-solid-test.ts#L133-L136), but when it runs the actual test (https://github.com/jaxoncreed/comunica/blob/feat/http-cache/engines/query-sparql/test/QuerySparql-solid-test.ts#L141-L143), jest freezes. Tests won't even time out.
More specifically, they will freeze because of a low-level crypto function that signs DPoP tokens (https://github.com/panva/jose/blob/main/src/runtime/node/sign.ts#L30). There may be some kind of problem with promisifying the crypto function, but the confounding part is that this has nothing to do with the cache.
Even more strange, when you force the cache to always return undefined, the stream will be consumed prematurely. The stream is still open inside ActorHttpCache but is closed by the time it reaches ActorRdfHypermediaNone. For some reason, the stream is being consumed somewhere after the cache code.
@rubensworks if any of this sounds like something you've dealt with before, let me know, otherwise I'll keep at it.
@jaxoncreed Ok, interesting. Since you also get these freezing tests on the master branch, this probably isn't a caching problem indeed. So I would suggest not losing too much time on this. As long as the tests pass in the CI, it's fine by me.
@jeswr have you run into this problem that @jaxoncreed is describing before? If so, how can it be resolved?
Unfortunately, I fixed the master branch and the freezing only happens with the cache now. It does not pass CI.
@jeswr have you run into this problem that @jaxoncreed is describing before? If so, how can it be resolved?
Yes - tests freeze a lot on me when I run the whole test suite (for me it has been happening prior to the introduction of Solid specific tests), I had pinned it on my machine since the Linux Kernel wasn't fully supporting all of the hardware on my work machine until recently. Locally, I've just been testing only the suites for the files I have changed and then skipping the pre-commit check and then relying on CI to run all the tests.
I don't have a fix for this specific solid test.
Thanks @jeswr . It looks like problem persists in the CI when you include the cache I wrote. Also, it looks like the integration tests freeze as well. I'm going to see if they're freezing for the same reason....
Okay, integration tests are freezing for some reason other than the crypto library. Not sure what it is yet. Looking into it.