rdflib.js
rdflib.js copied to clipboard
301 redirect : Wrong containment triples when fetching Container from NSS without trailing slash
Steps to reproduce
- fetch a container from NSS without using a trailing slash in it's URI
- query the store for the same URI as subject and
ldp:containsas predicate
Expected result
The canonical container URI is with a slash, so the query should not give a result
Actual result
A statement with a wrong relative URI as the object
Example
- Fetch https://bourgeoa.solidcommunity.net/chats
- Store contains the following triple:
<https://bourgeoa.solidcommunity.net/chats>
<http://www.w3.org/ns/ldp#contains>
<https://bourgeoa.solidcommunity.net/10b089b1-cec0-475f-9be3-7078d950c04c/> .
Correct containment statement would be
<https://bourgeoa.solidcommunity.net/chats/>
<http://www.w3.org/ns/ldp#contains>
<https://bourgeoa.solidcommunity.net/chats/10b089b1-cec0-475f-9be3-7078d950c04c/> .
Note the difference in the subject URI (trailing slash) and the object URI (relative to
/chats/)
This statement is not in the store, besides following the redirect from NSS. Only when fetching directly with a trailing shlash everything works as expected.
Test to reproduce:
describe("container URI without trailing slashes", () => {
it("should contain nothing", async () => {
const store = graph();
const fetcher = new Fetcher(store);
await fetcher.load("https://bourgeoa.solidcommunity.net/chats");
const contains = store.statementsMatching(
sym("https://bourgeoa.solidcommunity.net/chats"),
sym("http://www.w3.org/ns/ldp#contains")
);
expect(contains).toEqual([]);
});
});
When a container URL is without a /, NSS is using a 301 redirect to a new URL ending with a / The issue is with 301 return code. When using 200 the result is correct.
@timbl I am wondering if rdflib does check the header for location or only content-location ?
When a server receives a GET request to /chat, it should do the required:
- If unauthorized to read
/, respond with 403. - If authorized to read
/, respond with 404.
It may do the optional:
- If unauthorized to read
/chats/, respond with 403. - If authorized to read
/chats/, respond with 301 to/chats/, or: - If authorized to read
/chats/, respond with 200 withContent-Locationincluding the URL of the requested representation, e.g.,/chats/.ttl,/chats/.html,/chats/.jsonld, where the payload refers to/chats/.
@csarven — In https://github.com/linkeddata/rdflib.js/issues/617#issuecomment-1512453461, it seems that the several chats should become chat, or the single chat should become chats...
Here are all the headers:
$ curl -I https://bourgeoa.solidcommunity.net/chats
HTTP/1.1 301 Moved Permanently
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <chats.acl>; rel="acl", <chats.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Resource>; rel="type"
Location: /chats/
Content-Type: text/plain; charset=utf-8
Content-Length: 41
Date: Tue, 18 Apr 2023 21:39:09 GMT
Connection: keep-alive
Keep-Alive: timeout=5
$ curl -I https://bourgeoa.solidcommunity.net/chats/
HTTP/1.1 200 OK
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <.acl>; rel="acl", <.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Container>; rel="type", <http://www.w3.org/ns/ldp#BasicContainer>; rel="type"
WAC-Allow: user="read",public="read"
MS-Author-Via: SPARQL
Updates-Via: wss://bourgeoa.solidcommunity.net
Content-Type: application/octet-stream; charset=utf-8
Content-Length: 2
ETag: W/"2-nOO9QiTIwXgNtWtBJezz8kv3SLc"
Date: Tue, 18 Apr 2023 21:40:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5
$
So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.
@TallTed Good catch. I missed the slashes in the last item. Updated.
So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.
Yes. When I looked at rdflib code I did not find any use of location in relation with 301, but only of content-location in fetcher
In the Handle fetch() response content-location is checked has if it was a single location
But location is nowhere checked
Is the rdflib webOperation meant to do :
- use the resource location : not the case actually
- and use content-location resources : not the case actually with multiple files to load https://github.com/linkeddata/rdflib.js/blob/c5bcd953f9fc8f8e18ad47ef531144eee0d7f834/src/fetcher.ts#L1951-L1965
Yes, I suppose in case of a 301 redirect rdflib should use the URI of the location header as the base URI, not the URI it requested initially and got redirected.
In https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604 maybe original is the problem. I suspect it goes back to the original URI to avoid using the URI sent to a proxy when using a proxy. So we also want to avoid using the URI from the last fetch operation.
Well I just found out the same @timbl already said :sweat_smile:. But yes, I can confirm that's the problem. I've tried to reproduce the issue, and changing the following in https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L1161 the test passes:
.then(response => {
if (options.baseURI !== response.url) {
options.baseURI = response.url;
}
if (options.original.value !== response.url) {
options.original.value = response.url;
}
return this.handleResponse(response, response.url, options);
},
Not that this should be the solution, but it proves that the issue was indeed that the redirect was happening silently.
I found the same and checking for the Link Container in response of latest fetch does resolve the issue.
Replace https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604 with
let baseUrl = options.original.value
const isContainer = kb.any(options.original, null, ns.ldp('Container'))
console.log('@@ isContainer ' + isContainer)
if (isContainer && !baseUrl.endsWith('/')) baseUrl = baseUrl + '/'
let p = N3Parser(kb, kb, options.original.value, baseUrl,
null, null, '', null)
@angelo-v
https://github.com/linkeddata/rdflib.js/pull/621 passes your test
here is the npm version [email protected] to test with PodOS
It fixes the issue with wrong containment triples.
But still rdflib states that the resource without a trailing slash is of type ldp:Container which i think is wrong.
But still rdflib states that the resource without a trailing slash is of type
ldp:Containerwhich i think is wrong.
@angelo-v
I suppose this patch will resolve your point https://github.com/linkeddata/rdflib.js/pull/621#discussion_r1186346645
With this patch a slash is also added to the resource without a trailing slash of type ldp:Container.
@angelo-v
here is the npm version [email protected] to test with PodOS