specification Discovery of storage

While writing tests for Section 4.1, I found no reliable way to discover a storage, and therefore no reliable way to test this section's requirements.

Now, since we generally have a storage on the server's /, for the most part, you could just start with that, and the tests would pass, but that's not the requirement. A server may support several storages, but parts of the URI space doesn't need to be a storage.

In the current protocol, you can discover the storage by traversing towards it's root container. So, in principle, you could GET /foo/bar to discover that the root container is /foo/. However, this can't be used for discovery, because it would require the entire URI space to be covered by storages, and that may not be the case. If I have understood it correctly, you could for example have that /pods/users/foo/ is the root container for the user foo, and similarly for other users. Then, just GET /bar/baz will not yield a pointer to a storage.

To remedy this problem, I suggest that we have a way that clients can discover the pods as a SHOULD level requirement (so that a server may support anonymous pods, but otherwise should make them discoverable).

Sep 16 '21 10:09 kjetilk

There are basically two different mechanisms that I have in mind.

One is that we have a resource .well-known/solid/storages that provides an RDF resources listing the storages.

The other is that we define that the response payload to OPTIONS * has it.

Personally, .well-known has always been itching my aeastheticle (that much out-of-band knowledge of resource identifiers just feel wrong), so I would prefer the latter if possible.

Orthogonal to that is the content of that response. I see two options, one is for example:

<> pim:storage </pods/users/foo/> , 
               </pods/users/bar/> .

However, there's no value to this being self-describing. Also, it appears to stretch the definition of pim:storage. Even though its domain is undefined, the vocabs says rdfs:comment "The storage in which this workspace is", and this resource is certainly no workspace.

Therefore, I rather suggest we do for example:

</pods/users/foo/> a pim:Storage . 
</pods/users/bar/> a pim:Storage .

That tells you exactly what you need to know in concise terms, and without stretching the definition of anything, it describes the storages exactly the way they describe themselves.

Sep 16 '21 10:09 kjetilk

However, this can't be used for discovery, because it would require the entire URI space to be covered by storages and that may not be the case.

I don't quite understand this. Any given resource is part of only one storage and there are no overlapping storages.

If I have understood it correctly, you could for example have that /pods/users/foo/ is the root container for the user foo, and similarly for other users. Then, just GET /bar/baz will not yield a pointer to a storage.

No. /pods/users/foo/ and /bar/baz are under different storages.

"The storage in which this workspace is"

That's old. There was an update and the current definition is https://github.com/solid/vocab/blob/bd9ce1c7806254bf1307c73f2b5e0dd3eeaab101/space.n3#L91-L92 . I'll follow up with Tim to push latest to w3.org.

That tells you exactly what you need to know in concise terms

That works only for 200 response. Clients may need to navigate up the hierarchy eg. to find the storage owner, irrespective to their permission on each resource on the way. This was the consensus.

To remedy this problem,

I don't see a problem :) but I'll respond to the following:

I suggest that we have a way that clients can discover the pods as a SHOULD level requirement

The Protocol describes one way that is guaranteed for clients to discover the storage. Clients are not required to discover the storage.

(so that a server may support anonymous pods, but otherwise should make them discoverable).

What's an "anonymous pod" - defined anywhere?

Close issue?

Sep 16 '21 11:09 csarven

No. /pods/users/foo/ and /bar/baz are under different storages.

What is that requires /bar/baz to be in a storage?

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server? That would seem like an infringement on the rights of a server.

Sep 16 '21 12:09 kjetilk

What is that requires /bar/baz to be in a storage?

How was /bar/baz created? How was /bar/ created? How was / created - that's given; root container / Storage.

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server?

Of course /static/ can exist. That can indeed be the root container / Storage. It is just that resource /bar/baz wouldn't be under that storage. /bar/baz may be in /bar/, and if that's not the storage, it'll be in / as storage.

Sep 16 '21 12:09 csarven

What is that requires /bar/baz to be in a storage?

How was /bar/baz created? How was /bar/ created? How was / created - that's given; root container / Storage.

A legacy CMS system or something. Whatever, but not the Solid protocol.

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server?

Of course /static/ can exist. That can indeed be the root container / Storage. It is just that resource /bar/baz wouldn't be under that storage. /bar/baz may be in /bar/, and if that's not the storage, it'll be in / as storage.

OK, I didn't quite parse that, but / can't be a storage, because then it would be overlapping with /pods/users/foo/, right?

And BTW, /pods/ can't be in a storage either, right, for the same reason. It follows from the requirements that there can be more than one and the requirement that they are non-overlapping, there there has to exist some space that is not necessarily contained.

Sep 16 '21 13:09 kjetilk

One is that we have a resource .well-known/solid/storages that provides an RDF resources listing the storages.

FYI proposal for notification also defines .well-known/solid https://github.com/solid/notifications-panel/pull/3

Sep 16 '21 13:09 elf-pavlik

If / is a Storage, then there are no other Storages in that path. If /pods/ is a Storage, there can be other Storages under /, e.g., /bar/, /static/.

Sep 16 '21 13:09 csarven

Yes, and therefore, if /pods/users/foo/ and /pods/users/bar/ are storages, then /pods/users/ and /pods/ can't be storages, and so, the discovery method breaks down.

Sep 16 '21 13:09 kjetilk

the discovery method breaks down

What breaks exactly? There is nothing prior to Storage. The path may be in the URI but there is nothing to see in /pods/users/ or /pods/. Discovery of Storage starts by giving URI as input.

Sep 16 '21 13:09 csarven

Yes, but which? If you do not know anything about the structure of the pod, how do you get started?

Sep 16 '21 13:09 kjetilk

I'll bite. How did a client come across the pod?

How do you find out my inbox? What's the input?

Client select or arrives at a target somehow. Do you think it would help to introduce language along the lines of https://www.w3.org/TR/ldn/#discovery :

The starting point for discovery is the resource which the notification is to or about: the target. Choosing the most appropriate target resource from which to begin discovery is at the discretion of the sender or consumer, since any resource (RDF or non-RDF) may have its own Inbox.

Sep 16 '21 13:09 csarven

No, I don't think that would help.

Say that Solid lives in a Web ecosystem, where servers manage pods along with legacy systems, using the same origin. You may then have arrived at the server by very conventional means, like a link to a non-Solid resource, news articles. Whatever stuff you find on the Web today. And then you, as the client, want to discover if there are any pods there. The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

We can't control all the content out there, but in order to lead a healthy existence in an ecosystem, with a ramp-up part, I think it is vitally important to have some hooks that you can be sure exists.

Sep 16 '21 13:09 kjetilk

The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

Hmm, I think that's a broader question or orthogonal to discovering Storage, and there may be a simple answer to it by checking the type of resource. For example, LDP mentions rel=type ldp:Resource ( https://www.w3.org/TR/ldp/#ldpr-gen-linktypehdr ) to indicate LDP support.. and for Solid, there could be something like solid:Resource ( see also https://github.com/solid/specification/issues/194#issuecomment-694828342 )

Sep 16 '21 15:09 csarven

So, testing out the algorithm:

GET /pods/users/

200 OK
Content-Type: text/html

<h1>These are our users</h1>
<ul><li>foo</li><li>bar</li></ul>

GET /pods/

200 OK
Content-Type: text/html

<h1>Yeah, we really love Solid!</h1>

GET /

200 OK
Content-Type: text/html

<h1>Welcome to our site!</h1>

Result: Storage not discovered.

GET /pods/users/foo/assets/images/bar.jpg

200 OK
Content-Type: image/jpeg

sdoglghsdgfjh

GET /pods/users/foo/assets/images/

200 OK
Content-Type: text/turtle

<> ldp:contains <bar.jpg>


GET /pod/users/foo/assets/
Accept: text/html

200 OK
Content-Type: text/html

<h1>Seriously</h1>
<p>I've done six requests now, and I still haven't discovered a storage. 
How long do you actually expect me to keep doing this? 
This is no way to make a discovery protocol. 
Can't I just have a list of storages?</p>

Result: The client received no guidance, it wasn't required to, and so it gave up. Very reasonably, I might add.

Sep 16 '21 15:09 kjetilk

The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

Hmm, I think that's a broader question or orthogonal to discovering Storage, and there may be a simple answer to it by checking the type of resource. For example, LDP mentions rel=type ldp:Resource ( https://www.w3.org/TR/ldp/#ldpr-gen-linktypehdr ) to indicate LDP support.. and for Solid, there could be something like solid:Resource ( see also #194 (comment) )

There are too many heuristics in there. We need a simple mechanism just to get started.

Sep 16 '21 15:09 kjetilk

First of all, what kind of an evil application are you that uses GET instead of HEAD just to discover Storage in HTTP header. The spec allows both for obvious reasons...

And again, the response could've been 403.

Need to discover root container / Storage somehow. There has to be an input. The Protocol already works with 3986 on hierarchical paths and containment. So what's in place is a quick way to break the segments and check. The Protocol also states:

Clients may check the root path of a URI for the storage claim at any time.

When you say:

Result: Storage not discovered.

If it is a Solid server, it should've had Link rel=type Storage in header. If it is not discovered by that point, the application would know it is not working with a Solid server.

There are too many heuristics in there. We need a simple mechanism just to get started.

We already have something but you want something different.

What I'm saying is that the use cases to discover the Solid server's Storage is different than the use case to determine whether a server supports the Solid Protocol.

If you don't like the discovery algorithm looking for rel=type Storage or resource with rel=type solid:Resource (for Solid server). We could introduce pim:storage in the HTTP Link header (similar to what we have currently for message body):

Clients can discover the storage which contains the resource of an HTTP HEAD or GET request target by checking for the Link header with rel="http://www.w3.org/ns/pim/space#storage". The target of the relation is the storage (pim:Storage).

And the requirement for the server obviously.

Sep 16 '21 16:09 csarven

Can't I just have a list of storages?

Completely different use case. The 1) storage of a resource, is different than 2) list of storages in a pod, is different than 3) whether a server supports the Solid protocol.

Sep 16 '21 16:09 csarven

Can't I just have a list of storages?

I do not understand the use case motivating the feature that lists all storage locations on a Solid server. Surely there are cases where storage location would not be discoverable by public agents.

The use case that I do understand is the following: As a user, what are the locations of my storage roots on a Solid server.

For example, I may have a WebID at https://id.example/{username} with several Solid Pods at: https://solid.example/{uuid}/. How do my apps know where my data is stored? Assume that I don't want to advertise these locations to the world in my WebID profile.

It would be useful to have an endpoint on the storage server that, given an access token asserting a user's identity, list the data pods that this user owns. Clearly, this endpoint would require authentication. I also see this only being relevant in cases where there are multiple storage roots on a Solid server.

Sep 16 '21 17:09 acoburn

Can't I just have a list of storages?

I do not understand the use case motivating the feature that lists all storage locations on a Solid server. Surely there are cases where storage location would not be discoverable by public agents.

Primarily, it is not so much a use case, it is a quality attribute, that conformance should be easy to verify. The current specification makes it difficult.

Sep 16 '21 19:09 kjetilk

Surely there are cases where storage location would not be discoverable by public agents.

Nod.

I do not understand the use case motivating the feature that lists all storage locations on a Solid server.

Nod.

The use case that I do understand is the following: As a user, what are the locations of my storage roots on a Solid server.

I do not understand the use case motivating the feature that lists all storage locations of an agent on a Solid server.

The use case that I do understand is the following: As a user, what are the locations of my storages on the Web.

Discovery of storages starts from the agent because the agent can have storages on different origins. One way that apps can start discovery of an agent's storages is from the WebID Profile document (accessible by any agent with Read). Another would be from a different (access controlled) resource listing the storages such as the pim:preferencesFile or if necessary, a dedicated property, e.g., storageIndex. The statements will be in this form: <agent> pim:storage <storage>. I think we have this use case covered.

Edit: In the Protocol, the subject would be the agent as per the use case above:

Clients can discover a storage by making an HTTP GET request on the target URL to retrieve an RDF representation [RDF11-CONCEPTS], whose encoded RDF graph contains a relation of type http://www.w3.org/ns/pim/space#storage. The object of the relation is the storage (pim:Storage).

Sep 16 '21 22:09 csarven

I have started to think about this differently: It is also about the authority of the URI space of the storage.

There may/will be other APIs on a Solid server, and they will occupy parts of the URI space. It is further very likely that parts of the storage's space will be occupied by URIs naming things that aren't controlled by the storage. In fact, if the storage is at /, then there will be URIs of authentication and other mechanisms that aren't controlled by the storage.

The question is then, who has the authority to name things in the URI space of a storage? If we leave that to some "server admin" entity, which can have a server-wide discovery mechanism, then that "server admin" entity can override the wishes and priorities of the storage's owner or user.

I think we should be very careful to use server-wide discovery mechanisms in other protocols that define APIs. Instead, the server should make storage discovery really easy, and thereby leave the control to the storage owner or user of their own URI space. The root container is a better place to manage further discovery.

Sep 24 '21 10:09 kjetilk

I think that with the resolution of https://github.com/solid/conformance-test-harness/pull/119 it now makes sense to remove this from the milestone and return to it for 1.0. While I still think the current behavior is not sufficiently testable (as also confirmed by the above PR), we don't need to prioritize it right now.

Any opposition to remove it from the milestone?

Oct 25 '21 09:10 kjetilk

Having had time to think about this further, I retract my idea to have a listing of server's storages, that is not sufficiently aligned with privacy expectations in the case where a server hosts many (my assumption has been that that's a rare case, you'd want a domain).

I still think the algorithm needs to have work, it doesn't scale to have to move up the tree, possibly in a large tree. Whether that should be addressed in this issue or more generally in #355 is an open question.

Jan 20 '22 12:01 kjetilk

So, I have a slightly different use case than one that has been discussed already. I do not care about listing all the storages of a server (and agree with others that this is probably a bad idea), but as an application developer I need to know where to store data given a WebId. Therefore, I would like to propose extending section 4.1 on storage to include something along the lines of the following:

Servers exposing the storage resource MUST advertise by including in a pim:preferencesFile linked to from the WebID document.

It would also require an addition to the WebId section along the lines of:

WebID Documents MUST contain at least one (exactly one?) pim:preferencesFile triple.

As stated earlier, exposing the storage directly in the WebID document itself won't work, because it is a public document and not all storages will be public. However, there MUST be a way for application developers, given a WebID, to get a list of storages owned / controlled by that WebID. The natural place to put it then, would be in a private pim:preferencesFIle.

This aligns with the expectation set by the WebId profile group that a well-formed document MUST include a pim:preferencesFile (https://github.com/solid/webid-profile/blob/main/notes/pre-final-draft.md#3-private-preferences---pimpreferencesfile)

That document, however, suggests that if a preferencesFile does not exist, it should be created by application developers. As an application developer, this puts me in an impossible catch-22. If I have a WebId that has no pim:storage and no pim:preferncesFile, I cannot create a pim:preferencesFile because I don't know where to store it. Using the currently advertised way of accessing a storage (the Link header) only works if the WebID document exists as a subpath of that storage, and even then it does not allow me as an application developer to determine all storages a user controls, only the one the WebID document is stored in.

This is something that MUST be exposed by the server in some way, and this seems like the most reasonable way to do it.

May 27 '22 20:05 ianconsolata