Preserving relative IRIs in RDF representations in server responses
I open this issue to capture implementation feedback from @NoelDeMartin after his presentation at Local-First online meetup: https://www.youtube.com/live/GDQMLt3oqio Based on the following matrix conversation
To my understanding some of Noel's implementation depends on Solid Server using relative IRIs in the reponses. Specificaly with BASE being set to the response location (Response.url).
Currently as far as I can tell Solid Protocol doesn't offer such guarantees, for example (PREFIX doesn't matter, just the BASE):
App X
PUT /test HTTP/1.1
Host: pod.example
Content-Type: text/turtle
PREFIX ex: <https://ns.example/>
<> a ex:Test .
Server for examle stores it in internal quadstore and later:
App X
GET /test HTTP/1.1
Host: pod.example
Content-Type: text/turtle
<https://pod.example/test> a <https://ns.example/Test> .
Even if server would simply save verbatim turtle to filesystem or document oriented database, we could still insert in between another app making an update:
App Y
PUT /test HTTP/1.1
Host: pod.example
Content-Type: text/turtle
PREFIX ex: <https://ns.example/>
<https://pod.example/test> a ex:Test .
So both Server and Client could currently lead to a data that was using relative IRIs in specific serialization, to endup with serialization using absolute IRIs.
I would like to reference https://www.w3.org/TR/rdf11-concepts/#section-IRIs
Relative IRIs: Some concrete RDF syntaxes permit relative IRIs as a convenient shorthand that allows authoring of documents independently from their final publishing location. Relative IRIs must be resolved against a base IRI to make them absolute. Therefore, the RDF graph serialized in such syntaxes is well-defined only if a base IRI can be established [RFC3986].
If we want to provide some guarantees that in RDF serializations that server responds with, it uses relative urls with BASE set to the response location/url. I think something in those lines would be requried:
For text/turtle it would need to serialize it with BASE and remove the BASE from the content.
For application/ld+json it would need to compact the response with @context using @base, in that case I don't think the context could be removed so possibly it can't even be supported for JSON-LD. Maybe if @context only sets the @base it can be removed and consumer only needs to make sure to provide @base to the parser. Someone may need to check it.
Personally I only use full IRIs so this doesn't really impact my work, I'm capturing it as part of Solid CG responsibility of gathering feedback from implementers.
Duplicate of discussions in https://github.com/solid/specification/issues/69 , https://github.com/solid/specification/issues/194 , https://github.com/solid/specification/issues/342 , and several other related issues. I'm sorry I can't dig all of them up right now, but you may want to sweep through those and only add new information where it's appropriate.
(At that point, it might help to start from a clear use case, explain why the problem needs solving, review the current design decisions in the Solid Protocol, and then move toward evaluating potential solutions rather than jumping straight into one.)
I'd like to close this issue as a duplicate for now, to avoid repeating existing discussions and instead build on what's already there.
- https://github.com/solid/specification/issues/342 - this is a different issue, as I understand the use case it doesn't require fully preserving the content
- https://github.com/solid/specification/issues/194 - at first glance this one seems about containment and slash semantics, here the issue is different
- https://github.com/solid/specification/issues/69 - this doesn't seem to be related at all
We as CG can decide to use different approach to gathering implementers feedback. In referenced issues I haven't noticed references to specific implementations. @csarven please don't make arbitrary decisions of closing issues, if you act as CG draft editor this feedback may be more appropriate for LWS WG where the work continues.
EDIT: I will not be present at CG meeting tomorrow but I still proposed topic related to this issue in case someone has constructive ideas.
Please stop mischaracterising my actions, they are not arbitrary. This is a strawman fallacy. I'm working to reduce duplication, not act as a gatekeeper. Creating issues without checking prior discussions or verifying alignment with Solid Protocol, and jumping straight to solutions without considering the use case, is not constructive. If this is purely implementation feedback or a solution proposal, take it to LWS.
To my understanding some of Noel's implementation depends on Solid Server using relative IRIs in the reponses. Specificaly with BASE being set to the response location
Just to clarify, my implementation doesn't depend on this; but I've found it to be useful when migrating PODs (or when domains change, like the infamous migration from solid.community to solidcommunity.net). If the paths are conserved as relative, migrating documents from one POD to another is very easy. Otherwise, you have to edit the base manually.
Thank you for clarifying @NoelDeMartin, my question during the webinar was related to this slide: https://www.youtube.com/live/GDQMLt3oqio?t=835
As I understood, your app(s) can create data locally without user logging in. And if the user at some point decides to log in that specific data can be stored in the pod. So when you introduce the other two servers on your diagram, it's not that the same data gets replicated across all of them, it's simply that different data can be stored on each of those servers?
If that's the case relative IRIs seem like implementation detail of your application, and once data get's synced to one server, from there on it always simply uses full IRIs in that one server's namespace?
I would consider the 'migration' case separately, especially that IMO a proper migration wouldn't break incoming links, so for example it would also change DNS records or maybe one day data would use DIDs like https://keri.one/
The solid.community -> solidcommunity.net case I would consider more as a 'partial recovery' from a disaster since all the external links have been broken. In other words full migration vs. partial recovery, either way it sounds more like a separate use case and isn't directly related to normay day-to-day operation of apps and servers.
when you introduce the other two servers on your diagram, it's not that the same data gets replicated across all of them, it's simply that different data can be stored on each of those servers?
Yes, during that slide I was trying to make the point that as a user you can choose where to store your data, but in that example those 3 PODs would be for 3 different users (or one user with 3 different accounts). In my apps, if you want to change PODs you have to log out, and log in again in a different POD.
If that's the case relative IRIs seem like implementation detail of your application, and once data get's synced to one server, from there on it always simply uses full IRIs in that one server's namespace?
Well, as I mentioned in my application it isn't much of a problem either if the links are relative or absolute.
I was just making the observation that when I wanted to migrate PODs in the past, this process was a lot easier because I had written my apps in a way that they use relative paths when writting to the POD. And I think this relates to the spec in the sense that it should allow for this use-case of retrieving relative paths, be it the default behaviour or using some headers. Currently, when you do a PUT request, many implementations will return the exact same thing you sent in the body when you GET the document. And I don't think that's ideal. In contrast, doing a PATCH with an application/sparql-update content type produces consistent behaviour (regardless of using relative or absolute paths in your update, you'll get the same response in the GET).
It would be nice if at least everything is consistent, but I'm not sure it that goes against the PUT semantics. In terms of how difficult it is for implementers, I would have to think it's not that difficult; given that PODs already support the behaviour I expect using PATCH requests.
In my apps, if you want to change PODs you have to log out, and log in again in a different POD.
Could you pleaes clarify that, does it mean that - users can own multiple storages, possibly each on a different server, and they simply host different data on each of those storages, but if they want to migrate certain data from one storage to another, your apps allow it but it requires to re-log in? Or you only refer to using different identity WebID provided by different OIDC Provider? I guess it's not about someone using different OIDC Provider with the same WebID.
I intentionaly don't use term pod here, since on the protocol level there is loose coupling between WebIDs, OIDC Providers and Storages and I believe each of those can have many-to-many relation and everything can be on a seprarate server or colocated in any combination. I hope keeping those distinctions will help us avoid missing any nuance.
Or you only refer to using different identity WebID provided by different OIDC Provider?
Yes, that's it. At the moment, my apps only support being logged in with one webId at a time, and using one storage. I guess you could use multiple storages if you want, given that everything is taken from the type index. But by default, it only uses the default storage (the first one listed in the profile, or found using the algorithm described in the spec).
I was just making the observation that when I wanted to migrate PODs in the past, this process was a lot easier because I had written my apps in a way that they use relative paths when writting to the POD.
If there was a standard way defined for migrating pods, would there be any other reason for you to care about relative vs. absolute IRIs, or it is just that pod migration convienience?
If there was a standard way defined for migrating pods, would there be any other reason for you to care about relative vs. absolute IRIs, or it is just that pod migration convienience?
It's just the migration convenience, if there was a standard way of migrating I probably wouldn't care.
Although, to be fair, I can imagine a POD shutting down and leaving users with the infamous .zip archive of all their data after the service has become obsolete. In that case, it would be nice if the RDF inside that .zip uses relative urls 😅.
But I guess I'm just extrapolating here. For practical purposes, the only problem I have now is the migration process.
I propose that we close this issue with following distiled implementation feedback:
Given lack of standard way of migrating storage, relative URLs in data can be convienient when migrating. With reliable storage migration mechanism in place, there is no need to preserve relative IRIs any more
Since it only relates to storage migration. I think it doesn't justify adding complexity to the general protocol to preserve relative IRIs in http responses. Of course I don't want to push my interpretation, @NoelDeMartin please feel free to propose different conclusion or let's just continue the converstion.
Thank looks good @elf-pavlik, you can close it if you want.
[@elf-pavlik]
PREFIXdoesn't matter, just theBASE
The above confuses me. I've not been able to track it through this discussion, but I hope to derail any misguided train that may be running here, before it impacts other things.
PREFIX absolutely matters when Turtle is parsed for loading into any triple/quad store. BASE also matters during such process. Neither of these is about the Solid Protocol.
I'll leave it at that, unless/until someone disagrees and hopefully provides more detail about the initial position.
Thanks @TallTed I only meant that PREFIX didn't matter for the conversation here, since it was mostly about relying on base from Response.url rather than having absolute IRIs in serialization or an explicit BASE statement. One of the citations links to https://datatracker.ietf.org/doc/html/rfc3986#section-5.1 and in relation to that there's no need to pay attention to any PREFIX (https://www.w3.org/TR/curie/) in the snippet
.----------------------------------------------------------.
| .----------------------------------------------------. |
| | .----------------------------------------------. | |
| | | .----------------------------------------. | | |
| | | | .----------------------------------. | | | |
| | | | | <relative-reference> | | | | |
| | | | `----------------------------------' | | | |
| | | | (5.1.1) Base URI embedded in content | | | |
| | | `----------------------------------------' | | |
| | | (5.1.2) Base URI of the encapsulating entity | | |
| | | (message, representation, or none) | | |
| | `----------------------------------------------' | |
| | (5.1.3) URI used to retrieve the entity | |
| `----------------------------------------------------' |
| (5.1.4) Default Base URI (application-dependent) |
`----------------------------------------------------------'