chore: run unauthenticated fetch before authenticated
closes #126
@NSeydoux can you review this quickly
I'm not sure this is actually the fix we really want. | In any case, I think this issue is broader than just the web client here, and should eventually be tackled on a higher level.
@NSeydoux - correct me if I'm wrong; but won't this become the default logic of the Inrupt libraries in a few months anyway?
Especially in cases where we're doing link traversal, where the number of requests can go in the order of thousands
I see your point here; does it make sense to then use a caching mechanism like; and if so should we implement something similar when we add this kind of logic to solid-client-authn
if (config.context.workerSolidAuth) {
const authenticatedFetch = workerToWindowHandler.buildAuthenticatedFetch();
const authUrls: Record<string, boolean> = {}
async function comunicaFetch(...args) {
if (authUrls[getNamesapce(args[0])]) {
// In future we should check if this is a 403? (double check code) response
// and if so try the *unathenticated* fetch. This handles cases like a qpf endpoint
// changing an endpoint from being auth-required to public halfway through a session
return await authenticatedFetch(...args);
}
const response = await global.fetch(...args);
if (response.status === 401) {
authUrls[getNamespace(args[0])] = true;
return await authenticatedFetch(...args);
}
return response;
}
config.context.fetch = comunicaFetch;
}
^^this solves performance when there is large number of requests to a QPF endpoint.
Especially in cases where we're doing link traversal
So if we are doing lots of traversal outside then this isn't an issue a Pod this isn't an issue, as all the documents we get should be able to retrieved with the first call; but if we are traversing documents within a Pod I do see your point. I'm not sure if it makes sense to cache origins rather than namespaces to make particular authenticated calls on (though that seems fraught with potential problems).
Alternatively, we could have a (non-default) option in the web client that enables this fallback-based fetch implementation.
I agree with the idea of an option; but strongly disagree with the default order. I think this fallback-based implementation should be the default as we should favour correctness over performance (which I know is bold of me to say given how many performance rabbitholes I've gone down in the last few months).
Good idea, I think this namespace/origin-based caching should solve all major performance issues that could arise from this mechanism. And this could then become the default indeed.
I'm not sure yet though what you have in mind exactly for this getNamespace implementation. Just removing the path of the URL?
The next version of solid-client-authn will indeed rely on a similar logic (defaulting to unauthenticated fetch, and only authenticating if necessary), because preemptive authentication is causing all sorts of issues, in addition of requiring the client to make a guess about the authentication mechanism, which should be discovered from the resource server.
However, you are absolutely right, it would not be acceptable to issue twice as much requests, and optimizations such as the caching @jeswr proposed will be required.
My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/
This could be implemented as
function getNamespace(input) {
const url = new URL(new Request(input).url);
return url.origin + url.pathname;
}
@rubensworks I've added the caching optimisations now. Still won't be as optimal as you want for link traversal to different paths within the same Pod. However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.
In the future I guess we can use Pod metadata to optimise this.
My stance is that we should be caching by namespace rather than origin, i.e. http://example.org/path?query=1 goes to http://example.org/path rather than http://example.org/
But in that case we still suffer from the performance problems, right?
For example, when traversing over a pod with 100 Turtle files in http://example.org/container/, this would require 200 HTTP requests (instead of 100).
So perhaps origin-based caching may be better after all?
However, the auth requirements for each of those files does need to be checked independentally as they will have different permissions.
If I'm correct, using authenticated fetch is never a problem on pods, right? I've successfully done authenticated queries over pods with public resources in the past. So the false-positives in that case may not be a problem.
In any case, a way to disable this behaviour seems important.
In the future I guess we can use Pod metadata to optimise this.
Definitely, but we just have to be sure that we don't break existing solutions in order to fix other problems :-)
If I'm correct, using authenticated fetch is never a problem on pods, right?
I have a hunch that this is implementation dependent - and it just so happens that our current implementations are ok with it. But that is more of a questions for someone like @NSeydoux .
In any case, a way to disable this behaviour seems important.
Agreed - I think we were just in disagreance on which should be the default. I can add in that option here & to the website once we make a decision on whether this is the correct caching strategy for us right now.
Any updates on this?