specification Discuss returning 404 for privacy reasons

Jul 18 '19 22:07 RubenVerborgh

Some thoughts:

I like the 404 here.

Would it be useful to make sure that the 404 response MUST be accompanied with an expiration time for caches?

If so, related concern: if implementations only include freshness information for the purpose of hiding, then the evilApp could infer that there is a resource being hidden even while they can't access it. So, then MUST all 404s be always accompanied with an expiration time so as to not leak that information?

Note that if 404 with SHOULD or MAY for expiration time, it increases the uncertainty of a resource existing for the evilApp to infer.

Jul 19 '19 07:07 csarven

Hi, new to this community but have some thoughts here:

My impression is that the active use case for 404 in Solid is to check whether a username is taken - which can be done at root level (and return a 404).

Any requests in deeper directories could return a 403 Forbidden if the ACL's don't allow public read. The 403 can also be made default regardless of whether or not the resource/path is valid.

This would prevent inference of resource existence, but avoids using 404 when a resource may in actual fact exist (obfuscation).

Jul 29 '19 14:07 humont

Hi @humont,

The 403 can also be made default regardless of whether or not the resource/path is valid.

Good point, that does not seem to be disallowed by RFC7231:

The client MAY repeat the request with new or different credentials. However, a request might be forbidden for reasons unrelated to the credentials.

The RFC's suggestion to use a 404 is only a MAY:

An origin server that wishes to "hide" the current existence of a forbidden target resource MAY instead respond with a status code of 404 (Not Found).

However, the question is why we would prefer a 403 over a 404. You write:

avoids using 404 when a resource may in actual fact exist (obfuscation).

But such obfuscation is explicitly allowed by the spec (see above); and a constant 403 is no less an obfuscation. So at the moment, I am still leaning toward 404 in case of privacy reasons, if the pod user so desires. On the other hand, a 403 makes it easier, since no such preference should be stated then, and it hints at authenticating (whereas a 404 does not).

My impression is that the active use case for 404 in Solid is to check whether a username is taken

That is not an official API though (and not the point in general); would be a topic of pod management.

Jul 29 '19 20:07 RubenVerborgh

But such obfuscation is explicitly allowed by the spec (see above); and a constant 403 is no less an obfuscation. So at the moment, I am still leaning toward 404 in case of privacy reasons, if the pod user so desires. On the other hand, a 403 makes it easier, since no such preference should be stated then, and it hints at authenticating (whereas a 404 does not).

There is ambiguity as to which should be used, that's for sure. After browsing around on Stack Overflow and various blog posts, the common wisdom seems to indicate that people assume a 403 implies the existence but no permission (despite this not being in line with the actual definition of 403). So theres that...

My preference (for what it's worth) is a semantic one: 403 = "you're not allowed to ask this question" wheras: 404 = "you're question is allowed, but i won't tell you the answer".

as far as obfuscation go, 404 seems to be more fit for purpose as it provides some confusion as it leaves the question of "did i not find the resource in general, or did i not find it just for you" ?

Jul 30 '19 09:07 humont

I would also vote for the '404 by default, if unauthorized or not found' option. (All social media platforms, from LiveJournal to GitHub (private repos) to Facebook, take this approach.)

That said, I do think that it would be useful to add a Web Access Control term (something like wac:allowRequestPermission) that signals a 403 instead of a 404 and allows users to ask for permission.

Jul 30 '19 17:07 dmitrizagidulin

@dmitrizagidulin Isn't a given that the requesting agent can come forward with credentials on any resource? If I'm interpreting correctly, I think signalling allowRequestPermission along with a 404 works contrary to obfuscating whether a resource actually exists/reachable or not.

Jul 31 '19 11:07 csarven

I would only ask how much would the 404 approach will cause the applications lose out on considering the option to authenticate (if not already) and re-request.

So, re 404, I think it'd be useful to cover both ends in the Solid spec. The 404 in RFC:

or is not willing to disclose that one exists.

can be supported/clarified with by adding the following:

"The client MAY repeat the request with new or different credentials." -- repurposing the text from 403.

Aside: I wonder if this was already considered in RFC7231 and what was the rationale to omit.

Jul 31 '19 11:07 csarven

Speaking as a user, I hate 404 when I'm unauthorized because my login has expired or the like, because it often enough means I go in loops of "where'd that thing I know was here go?" Yes, the RFC says it's OK to do this, but the 403 response seems better to me -- because whether or not the thing does exist, it shows that there might be a change with different authentication, while 404 suggests that authentication doesn't matter.

Jul 31 '19 21:07 TallTed

I agree with @ajs6f in the Trellis issue referenced by @acoburn above (https://github.com/trellis-ldp/trellis/issues/454). Basically any conversion of 403 to 404 should be done at the outermost layer of the architecture (and not in the internals, to @ajs6f's point). But I'd like to see that conversion be controllable/overridable on a per-resource basis (even though an administrator may also provide a default 'conversion setting' for all resources served by an entire Solid server), and controllable by the user themselves (i.e. it's just another piece of resource meta-data that users can set explicitly).

Aug 12 '19 22:08 pmcb55

Punting the responsibility/decision making to the "outermost layer of the architecture" certainly sounds reasonable, if and only if, there is an "outermost layer" to speak of. Certainly we expect a Solid server to be self-contained (ie. working without any dependency or knowledge of the outer layer) to the point that it has some opinion on the primary UC, whether that's realised via 403 or 404 out of the box, or even configurable. Put differently, if we do acknowledge the UC, the Solid spec(s) should probably say something about it at the very least for the "internal architecture" so that it is prescribed and have tests. Anything pertaining to the "outermost layer" may be out of Solid spec's scope or at most be only descriptive (as opposed to prescriptive) in the end.

Aug 16 '19 11:08 csarven

@csarven I'm not sure I follow really. For me the 'outermost' layer of a Web server is pretty easy to define and configure - it's just a JAX-RS filter, and/or the last (or first) processor in a Camel route. In fact I'd see it as a classic example of the filter pattern, i.e. filtering a response to set it's status code to either 403 or 404 based on the context of the request itself and the configuration of the server. If by UC you mean Use-Case, I certainly wouldn't see that being defined by the Solid server or spec, instead it's defined by the context of the request (i.e. the preferences of the particular user) and the configuration of the server (i.e. the preferences of the Pod provider). So from a spec perspective I'd say the server CAN set a 403 or a 404, but that that SHOULD be override-able by user preferences set per resource.

Aug 17 '19 17:08 pmcb55

Having let this gel for a while, it occurs to me that (depending on the usage scenario) different responses may be appropriate for the same resource depending on whether the user is known or unknown, and on specifics of a known user. That is, it may be appropriate to return 404 to unauthenticated users, and 403 to (some or all) authenticated users.

Nov 10 '19 03:11 TallTed

To me, this sounds like an implementation issue that the Solid spec does not need to address, or might address in a best practices documentation. The "404 for privacy" is already there in RFC7231, and implementors may want to heed that advice if they are concerned with privacy, which most will want to be, and if so, they have many ways to achieve it, as @pmcb55 says, it is a filter pattern thing.

Even if many will want it so, I fail to see the value of turning it into a stronger normative feature of Solid.

Nov 11 '19 11:11 kjetilk

Not sure I agree. Privacy is a core issue to Solid. And different servers handling this behavior differently might become a source of confusion.

Nov 11 '19 16:11 dmitrizagidulin

I suggest that in order to reach consensus we should dive deeper with a different perspective.

Pat, whether clients can (in)directly influence responses as such seems to be a new feature. If it is sensible for a client to dictate that, at the very least, we need the notion of "hidden" and how to set/undo for a resource. So, lets defer to another issue for that.

Ted suggests a reasonable default. We should look at the specifics of both authn/z.

Kjetil, I agree that leaving things as is would be sufficient but it is not particularly easy to control in our environment without making the bridge that I think Dmitri is after.

Dmitri, let's investigate further and see if we can bubble anything up to a default or a recommendation. To be a bit more grounded, can resolving https://github.com/solid/specification/issues/116 (and other issues at that level) reveal anything useful? For example, if a user is unauthorized to read a resource, does it make sense to hide all references to that resource eg. in container listing. Intuitively, yes, but worth to consider the design any way. What something like that tells me is that we could specify a few specific scenarios considering privacy instead of a catch all. So, bubbling that up may mean that 404 is a reasonable default for such scenarios - which is already hinted in RFC. But where does that leave 403? Only for Write?

Do you find something along these lines to helpful:

"The client MAY repeat the request with new or different credentials." -- repurposing the text from 403.

What kind of a recommendation or non normative text do you think will help?

Can we come up with more scenarios?

Nov 12 '19 11:11 csarven

Following the principle of least confusion, I'm still not sure where the least confusion is. I tend to side with @TallTed 's first comment, that a 404 is confusing if the user next logs in and finds that the resource exists anyway. Also, I'm then more confused by your second comment, @TallTed , because surely, if a resource exists and the user is unauthenticated, the only appropriate response is 401? I.e., they need to have a chance to authenticate before being told that a resource that exists can't be accessed by them?

That leaves a pretty small group that this would be a relevant response for, a class of users that are authenticated, but not trusted enough to even tell them that a whether a resource exists. That could certainly be acl:AuthenticatedAgent users, as #32 is relevant in this situation. Then, language around retrying with different credentials might make sense, but 404 is a fairly catch-all error, I don't think it makes sense to overload it too much. Given this, it still doesn't seem to me that it is a big issue in Solid.

Moreover, given that there are low-impact ways to do this, I wonder if the first milestone is appropriate, I think we should bump this issue to the June milestone.

Nov 12 '19 12:11 kjetilk

@kjetilk — I think the sum of my comments is "there's no single answer, appropriate to all deployments." Also that, unfortunately, RFC blurs the line between authentication and authorization.

404 Not Found is permitted by HTTP RFC when the admin wants to conceal the existence of a thing from users (whether authenticated or not) who aren't authorized to see the content of the thing. As an admin/server, I may love this response.
404 Not Found is definitely confusing if you know something exists, and just didn't realize you were not authenticated, or were authenticated as a different user. As a user/client, I hate this response.
401 Unauthorized is more user-friendly when trying to access (both for READ and WRITE) something which exists, whether the user/client is unauthenticated or authenticated as a non-privileged user.
403 Forbidden is problematic — because according to RFC, "Authorization will not help and the request SHOULD NOT be repeated." But common sense dictates that this should be returned when authenticated as a non-privileged user, in which case changing authentication will help — but admins might choose to return 404 in this case....
401 Unauthorized may be a good response for most requests when unauthenticated. To wit —
- Explicitly public resources get delivered with 200 or whatever.
- Restricted or nonexistent (or any other category I'm not thinking of) result in 401, which leads to authentication, which gets whatever the admin feels appropriate (403 or 404) for the requested resource.

The above is not meant to be an exhaustive list of scenarios, but I think it covers most.

Nov 12 '19 15:11 TallTed

Thanks a lot, @TallTed , that was a good clarifying comment!

It seems the situation that we're in the worst situation to deal with is the situation where the user is authenticated, but they should try to authenticate with different credentials.

I do think we should bump this to the June milestone though, it neither has the importance, urgency nor the manpower to be solved in a month.

Nov 12 '19 15:11 kjetilk

Novell gave the 401/404 choice to ACL Controllers with a permission called "File Scan". It was not paired with a principals so you couldn't say that group X sees a 401 while everyone else sees a 404.

Overall, I feel like they had a pretty well-thought-out mapping of rights to actions which narrowed the vulnerability of someone deleting and replacing a file just to acquire ACLs control over it.

May 02 '20 17:05 ericprud

:bell: Edit: Tables in this comment are a WIP - fixing errors and including consensus as they come up - Majority of this information is already in the spec or can be deduced. Remaining bits will be reviewed and transitioned to spec.

Taking what's discussed above into account, below is a way to reconcile this issue - using some background from https://github.com/solid/specification/issues/116

The order of status codes with least information leakage: 401, 403, 404, 409 based on the following:

Solid's notion of hierarchical containment is loosely coupled with resource-based access control. Resources can be observable or discoverable - "knowable" - by agents having Read access privilege either on the resource or its container (inherited). Some auxiliary resources are discoverable by agents having Read or Control access privilege on the subject resource.

The existence of a resource may be unknowable in that a 403 neither implies that a resource exists or does not exist. When an agent is forbidden to allocate a URI to a resource, 403 is used.

When an agent has Read access to C/ or C/R, the existence of C/R can be known.

When an agent has Read access to C/R, the state of C/R can be known.

Then, 404 and 409 indicate that an agent is authorized to know if a resource exists or its state can be read.

If credentials are not required, 401 doesn't apply, and 403 isn't particularly useful. Then, 2xx or 4xx (besides 401, 403) can be used.

CORS preflight-requests have a successful response (2xx).

The tables below focus on authorization and resource state. All C/'s (containers) apply an access mode that can be inherited by C/R. Access mode marked with - indicates no access mode is explicitly set on resource. To provide some clarity, the tables incorporate some scenarios where access privileges are (hypothetically) set on non-existing resource.

GET C/R
HEAD C/R
OPTIONS C/R

C/	C/R	C/R exists	C/R doesn't exist
-	-	403	403
-	Read	200	404
Read	-	200	404
Read	Read	200	404
Read	Write	403	404

POST C/
Slug: R

/	C/	C/ exists	C/ doesn't exist
-	-	403	403
-	Read	403	404
-	Append	201	403
-	Read,Append	201	404
Read	-	403	404
Read	Append	201	404

Servers allocate unique URIs to resources on POST C/ requests. "C/R exists" is not applicable.

PUT C/

C/	C/ exists	C/ doesn't exist
-	403	403
Read	403	403
Write	200	201

PUT C/R

C/	C/R	C/R exists	C/R doesn't exist
-	-	403	403
-	Read	403	403
-	Append	403	403
-	Write	200	403
Read	-	403	403
Append	-	403	403
Write	-	200	201
Append	Write	200	201

Create requires Append (or Write) on C/ and Write on C/R. Replace requires Write on C/R.

PATCH C/R

PATCH based on application/sparql-update data type:

C/	C/R	Payload	Match	C/R exists	C/R doesn't exist
-	-			403	403
-	Read			403	404
-	Append	INSERT		200	403
-	Append	DELETE		403	403
-	Write	INSERT		200	403
-	Write	DELETE		403	403
Append	Write	INSERT		200	200
Append	Write	DELETE		403	403
-	Read,Write	DELETE	true	200	404
-	Read,Write	DELETE	false	409	404
-	Read,Write	DELETE+INSERT	false	?	?

Create requires Append (or Write) on C/ and Write on C/R. Payload with DELETE requires Read on C/R.

DELETE C/R

C/	C/R	C/R exists	C/R doesn't exist
-	-	403	403
-	Read	403	404
-	Append	403	403
-	Write	403	403
Read		403	404
Append		403	403
Append	Read	403	404
Write	-	204	403
Write	Read	403	404
Write	Append	403	403

DELETE C/

Deleting C/ behaves like deleting C/R but with additional requirements:

C/	C/ empty	C/ exists	C/ doesn't exist
-		403	403
Read		403	404
Append		403	403
Write		403	403
Read,Write	true	204	404
Read,Write	false	409	404

Aug 30 '20 22:08 csarven

C/ | C/R | C/R exists | C/R doesn't exist Read | - | 200 | 404

Shouldn't that be Read | - | 403 | 404 ? If you don't have read access on C/R then you shouldn't get a 200

Sep 03 '20 09:09 michielbdejong

POST C/ Slug: R

Slug should only be used as advice, and if C/R exists, the server should pick a different location. It should always return the location along with a 201.

Sep 03 '20 09:09 michielbdejong

PUT C/R C/ | C/R | C/R exists | C/R doesn't exist Read,Write | - | 200 | 201

Neither creating nor updating C/R should be allowed if you don't have write access to C/R itself.

Sep 03 '20 09:09 michielbdejong

PATCH C/R C/ | C/R | C/R exists | C/R doesn't exist

| Append | 200 | 201

Only if the PATCH is an append-only patch

Append | - | 200 | 201

All PATCH operations should be forbidden if you have neither write nor append on C/R

Write | - | 200 | 201

Again, all PATCH operations should be forbidden if you have neither write nor append on C/R

You're also missing a few combinations in the table for PATCH. You can remove the column for C/ there, it's irrelevant. The cases for a PATCH with only INSERT are:

-
Append or Write
Read and (Append or Write)

The cases for a PATCH with both INSERT and DELETE are:

-
Write
Read, Write

Sep 03 '20 09:09 michielbdejong

Shouldn't that be Read | - | 403 | 404 ? If you don't have read access on C/R then you shouldn't get a 200

Right, normally a 403, but the tables factor in default:

access mode marked with - indicates no control is set on resource - default access mode is determined.

It just means that the effective access mode is determined via the inheritance algorithm. While C/R doesn't have its own ACL but C does, so C/R can be read.

Perhaps what didn't come out clearly was that all C's in the tables have acl:default. I can make that clear in the proposal/analysis above. Let me know if this clarifies the tables though.

Slug should only be used as advice, and if C/R exists, the server should pick a different location. It should always return the location along with a 201.

I think that's mostly reflected in the table (and that Slug is only MAY) - with the exception of having only Append on Container and in that case server doesn't need to reveal Location for the created resource.

Neither creating nor updating C/R should be allowed if you don't have write access to C/R itself.

Right. Same as above re default.

Only if the PATCH is an append-only patch

Right. Assuming INSERT DATA ("Inserting results in 2xx."). If payload includes DELETE DATA, it'll be 403. Related: https://github.com/solid/specification/issues/118#issuecomment-569648485 . Also pending https://github.com/solid/specification/issues/125 but I think we agree on the general direction.

All PATCH operations should be forbidden if you have neither write nor append on C/R

Right. Same as above re default.

You're also missing a few combinations in the table for PATCH.

I'll add those, thanks! [I held off on these because it wasn't entirely clear in the issues - IIRC]

Sep 03 '20 09:09 csarven

Novell gave the 401/404 choice to ACL Controllers with a permission called "File Scan". It was not paired with a principals so you couldn't say that group X sees a 401 while everyone else sees a 404.

Overall, I feel like they had a pretty well-thought-out mapping of rights to actions which narrowed the vulnerability of someone deleting and replacing a file just to acquire ACLs control over it.

screenshot for those who ironically get a "page not found or you don't have access" error when trying to access the book:

Screenshot from 2020-09-03 06-48-04

Sep 03 '20 13:09 d-a-v-i--

A valuable workflow in Google Drive is the "click here to request access to this resource" . That is valuable for onboarding groups - typically it is hard for the owner to guess the ids of all the people they will want to share it with.

Sep 04 '20 15:09 timbl

@csarven The table entry for OPTIONS C/R should be changed into 2xx given

If a CORS check for request and response returns success and response’s status is an ok status, then: […] Otherwise, return a network error.

—https://fetch.spec.whatwg.org/#cors-preflight-fetch

Otherwise, we cannot answer preflight requests correctly.

Apr 04 '22 10:04 RubenVerborgh

CORS preflight requests are a vitally important consideration.

What I might suggest is this:

servers need to be able to distinguish between CORS preflight OPTIONS requests and all other OPTIONS requests. A server can do this by looking for the presence of three headers, which constitute CORS preflight requests:
all such CORS preflight requests always return a 2xx response (e.g., 200 or 204)
the response of a CORS preflight request gives away no information about the presence, absence or any distinguishing type of the target resource. In a word: all responses to CORS preflight requests are the same
all other OPTIONS requests consider authorization, will include resource-specific status codes (e.g., 200, 404, 403), and will include resource-specific headers (e.g., Link)

Apr 04 '22 12:04 acoburn

This issue has been quiet for a while but I have a question as a result of working on the tests for read access controls. In the table for POST C/ Slug: R there are 2 cases that don't make sense to me. You have read access to the container and that may be inherited. You attempt to POST a new resource to a target child container. The table suggests that if the target exists you would get a 403 as you are not permitted to write to the target. However it suggests that you would get a 404 of the target does not exist since you have read access to the parent container.

# edited to clarify
Read, -, 403, 404
-, Read, 403, 404

I think this is a problem for a few reasons:

You attempted to write and that is forbidden, whether or not the target exists. The response should be a 403. If you then read the parent container you would indeed have permissions to see that the target didn't exist but why expose that information when it was not asked for? The agent is authorized to know about it' existence but it isn't asking that.
Authorization should take precedence over the 404 - that would align with the http decision trees referenced in https://github.com/solid/specification/issues/146
It appears to conflict with the earlier statement

When an agent is forbidden to allocate a URI to a resource, 403 is used.

Sep 01 '22 15:09 edwardsph

The request semantics of POST (including Slug: R in this case, but not particularly important here) is to "perform resource-specific processing on the request payload" targeting a resource (i.e., a container in this case). The server does not have a current representation for the target resource, which is what the 404 indicates so that the client can try again by changing the request (if it wants to).

The content in issue 146 is not fully worked out and overlaps with the work in this issue, specifically the tables.

As mentioned elsewhere, 403 would be a valid (acceptable) response, but 404 is both accurate as per request semantics and more useful for the client.

Sep 01 '22 16:09 csarven

Ok, whilst the discussion is ongoing, I will at least allow 403,404 in the tests.

Sep 01 '22 17:09 edwardsph

specification specification copied to clipboard

Discuss returning 404 for privacy reasons

specification
specification copied to clipboard