activitypods Consider reverting hierarchical paths for resources

In ActivityPods v2, the resources will not have an URI dependant on the container path. It will use the root path (eg. https://mypod.store/alice/data/), and a UUID slug. See this PR for more informations about this initial choice.

This is allowed by LDP, but goes against Solid specifications: https://github.com/solid/specification/issues/98

On the issue above, there are some very good arguments to allow non-hierarchical URIs, notably this one.

What should we do for ActivityPods ?

Advantages of non-hierarchical URIs

We can move the resource to another container without leaving behind dozens of redirects
If the container is renamed, we don't need to rename (eg. put redirects) its hundreds of contained resources
We can put the resources in several containers, and thus benefit from the various containers' default WAC permissions
The ldp:contains predicate is the only source of truth about the resource containement
It's harder to guess what a resource is about (increased privacy)
We are closer to blockchain-style identifiers, where everything is a hash (which will make sense if/when we use NextGraph as a triple store)

Advantages of hierarchical URIs

You cannot have orphan resources, as you cannot detach a resource from a container (?)
It's easier for developers to know what the resource is about
It's easier to find the container of the resource, without needing to look for ldp:contains (Note: not really a problem with a triplestore storage !)
?

May 06 '24 10:05 srosset81

Any feedback @csarven @lecoqlibre @Laurin-W ? Maybe overlooked advantages of hierarchical URIs ?

IMO imposing flat hierarchy in a graph is like trying to fit square pegs into round holes. It is doomed to create a lot of headaches, as we can see with shape trees (but that's another subject).

May 06 '24 16:05 srosset81

I'm happy with non-hierarchical approaches. It's unclear if this part of the spec is going to change in the future (some people dislike solid's restriction). And for us, this has benefits right now which seem to outweigh the disadvantages.

I think this is also about the philosophy of viewing solid as something where the working is to be understood by users easily vs thinking of solid as infrastructure. Solid gives us the features for decentralizing identity and storage, interoperability, ... --- all features benefiting users.

But is it important for users to understand the inner workings of how data is structured by looking at the URI or is it important for interoperating applications to manage that data easily? I'd argue that something like making users understand their data lies in the responsibiliy of the applications.

May 23 '24 09:05 Laurin-W

I've expressed some of these things elsewhere (that I'd have to dig up) but I'll try to respond to some things here that may be of help for what it is worth.

But is it important for users to understand the inner workings of how data is structured by looking at the URI

That's not the use case as to why hierarchical paths are used but it is a useful outcome of the design. All things being equal, it is something that at least some users have some familiarity with because it is essentially what they also see in their operating systems - most common ones over decades follow the same design pattern. If anything, any URI is using slashes but without any imposed hierarchy is counter to what they are familiar with to the point that it may work as an anti-pattern. If some hierarchy is not desired in the URI, then there is virtually no reason to use them. All URIs following a URI Template like /{uuid} would suffice, and it doesn't limit any system to have a "graph" view of things.

I would also add that there is a difference between resource organisation and knowledge organisation, and they don't need to be conflated. If some piece of information is to be categorised in a certain way, that's about the underlying knowledge, and not about the identifier.

is it important for interoperating applications to manage that data easily

Yes, to some extent. The slash semantics is not so much about because it is for a filesystem but rather it is super useful for developers to re-use widely available existing libraries / tooling that understand 1) the hierarchy 2) can communicate with their system. This in essence enables "interoperability" on different levels (between classes of products and the environments they are in), and not just classes of products interoping among each other as they defined in a specification.

Just my take on things..

May 23 '24 10:05 csarven

We can move the resource to another container without leaving behind dozens of redirects

Anything besides having all resources on the same container will be subjected to that "move" issue in a long enough timeline. So, either this is about knowledge organisation in which case information should be declarative part of the data or metadata, or it is about what the URI should be. When a URI is "allocated" to some thing (as per Web architecture), it carries a specific meaning. Redirecting it doesn't change the meaning. So, if something needs to be moved, it is because of other reasons.

If the container is renamed, we don't need to rename (eg. put redirects) its hundreds of contained resources

Why would the container name in the URI be "renamed"? The identifier is allocated. Why would that term need not be changed if not because of some knowledge reasons.

We can put the resources in several containers, and thus benefit from the various containers' default WAC permissions

Putting in different containers is about knowledge. Something can be discovered from different containers indeed, but that is orthogonal to how a particular resource is identified at a particular URI, and what that entails. I think different permissions potentially acting on the same resource will create more problems than provide an actual benefit but curious to see implementation feedback on that.

The ldp:contains predicate is the only source of truth about the resource containement

Right. With the way Solid Protocol uses LDP BC at least.

It's harder to guess what a resource is about (increased privacy)

If privacy is the driven factor of allocating a resource to a URI, then the terms in the path should not reveal anything pertaining to what the resource may be about. It is a marginal difference between using /{uuid} vs. /foo/bar/baz - if one doesn't want to supposedly disclose what the resource is about by looking at the URI. (I'm not suggesting that either is a better URI design.. just that there are different ways of looking at it all things equal.)

We are closer to blockchain-style identifiers, where everything is a hash (which will make sense if/when we use NextGraph as a triple store)

Ok. Not sure why this is relevant unless there some kind of interop? Anything short of using an exact URI Template published by different systems would be non-interoperable by definition. So, there is no particular benefit even if URIs look like random looking gibberish.

You cannot have orphan resources, as you cannot detach a resource from a container (?)

Something is always under a container, for instance the root container.

It's easier for developers to know what the resource is about

Why is this relevant for developers? Developers shouldn't have to know anything about the resource by observing the URI.

It's easier to find the container of the resource, without needing to look for ldp:contains (Note: not really a problem with a triplestore storage !)

Right, the whole .. idea that's virtually everywhere under the sun.

May 23 '24 10:05 csarven

Since one of my initial comments were referenced in the initial comment, I'll just share some of my evolved thinking.

Firstly, in a client-server architecture, the server wields much power, excessive power, some might say. Thus, it is important that the protocol design does not constrain user's behavior unnecessarily.

In the current design, it is up to the user to design their URI space, if they want to use non-hierarchical URIs, they can do so by using UUIDs in the root container, for example. I think this is important. One shouldn't think that Tim didn't appreciate the importance of not getting constrained to hierarchies, it is the Web, after all.

That things move around is inevitable, but it doesn't influence URIs if nothing in the URI has any semantics that change. This could point to UUIDs, but it also points in a different direction: Do not use the hierarchy for knowledge organization, use for example SKOS for that (now, Tim once told me that he likes using URIs for knowledge organization, to which I disagree). Presently, WAC attaches some utility to the hierarchy, and I have been concerned for a long time this would cause practical problems, but it is really the only thing in Solid that does that.

I think there's a coordination problem here: The user of the pod should have the authority to design their URI space. If the server only allows non-hierarchical URIs, then it will infringe on the user's right here, so I am against that. But if apps simply grabs some of the user's space without them having any influence, that too would infringe on that right. Ultimately, I suppose you could say that the user wouldn't care, but then, I am also concerned that Solid would become just a datastore where the user would have little power to ensure interop, and so interop wouldn't happen.

I would also remark that I agree that it is very unfortunate that SPARQL hasn't played a greater role. The main reason I joined was to have federated query over decentralized pods. I think that's crucial, but there was no interest in it.

In conclusion I'm now :-1: on getting rid of hierarchical paths, that would be a constraint on the user's freedom. Instead, other technologies should be used on the top of the resources to add semantics to them.

May 28 '24 14:05 kjetilk

Nice to read your take on the matter @kjetilk ! And thank you for taking the time to answer.

Also I am glad to see that we share some common views about SPARQL and your concern for separation between knowledge/semantic and URIs. I know WAC quite well for having implemented some of the specs in SemApps, but i would say it is a minor issue that the paths there hold some semantic.

About the question of freedom for the end-user, if it is important that they can build paths for their URIs, then we could implement that in a virtual way for them. The URL with path would point internally to a UUID (without the user seeing that internal mechanism). The fact that we want to use UUID is not because we want to impose anything to the user. It is just that internally, that's what we have. But if I take the analogy with Inodes again, then the Inode is hidden to the end-user, who only sees files and directories. So we could probably do the same here: we would emulate a hierarchy that can be used in the paths of URLs, and internally it would all be UUIDs.

One question that I still have is the understanding on what is supposed to happen if a URL contains some path (which I guess reflect a tree of containers. but maybe I am wrong here), then, if the paths are constructed with a hierarchy of containers, and the URL is full of those paths, and if we consider that the URL is the unique identifier to the resource and that it should be permanent and never change, then, what happens when the paths (containers) change after the resource was created and was assigned a URL? I find the whole correlation between URL and path a bit confusing and unnecessarily risky for long term consistency of the links.

Could you bring some light on that question that I have? How can we make the URLs durable, if they contain paths, and those paths are subject to arbitrary changes. Is the spec saying anything about that? Is it just left to the user to organize his pod as he pleases, and if he is renaming the paths/folders, then it is his fault and should expect some 404 ?

Can't we do a little bit better on that, to help the user not break his links?

May 28 '24 17:05 nikoPLP

One question that I still have is the understanding on what is supposed to happen if a URL contains some path (which I guess reflect a tree of containers. but maybe I am wrong here), then, if the paths are constructed with a hierarchy of containers, and the URL is full of those paths, and if we consider that the URL is the unique identifier to the resource and that it should be permanent and never change, then, what happens when the paths (containers) change after the resource was created and was assigned a URL?

I think there are two answers to this, one is that you use a conventional 301 redirect. Make sure that it is automated to make that redirect, so that if it does happen, things don't break.

The other answer, I feel, is that you don't attach too much meaning to the URI. Nothing that the user sees should rely on the URI (which, I suppose, is the case with UUIDs), but you may use internal structure of the URI for management within apps, that's fine as long as you don't rely on it being static. But best practice should be to rely on the RDF describing what the resource semantics and use that.

But then, I stopped developing on Solid, so it is ultimately implementation experience that counts.

May 28 '24 18:05 kjetilk