fs icon indicating copy to clipboard operation
fs copied to clipboard

Add explainer for FileSystemHandle.getUniqueId()

Open a-sully opened this issue 3 years ago • 14 comments

  • [ ] At least two implementers are interested (and none opposed):
    • Chrome
  • [ ] Tests are written and can be reviewed and commented upon at:
    • https://github.com/web-platform-tests/wpt/blob/master/fs/script-tests/FileSystemBaseHandle-getUniqueId.js
  • [ ] Implementation bugs are filed:
    • Chrome: https://crbug.com/1342961
    • Firefox: …
    • Safari: …

a-sully avatar Aug 10 '22 22:08 a-sully

Minor nit: should be Id, not ID. https://w3ctag.github.io/design-principles/#casing-rules

domenic avatar Aug 11 '22 03:08 domenic

Why "Id" and not "Uri"?

This identifier the browser provides is not "universal", since it will provide different strings to different sites for the same file. "Id" is a bit more generic, so I went with that.

Happy to consider other names for this method, though!

a-sully avatar Aug 17 '22 01:08 a-sully

The "universality" in URIs does not uniqueness in correspondance, either from the reference to the referred object or the converse. file:///etc/passwd is universal, even if the same reference refers to different objects. And it is very common to see multiple references (or aliases) for the same resource on the web.

martinthomson avatar Aug 17 '22 05:08 martinthomson

I'm not well-versed in this debate so bear with me here, but my intuition is that "URI" implies it can be used to fetch the resource, which is not the case here. The browser provides the identifier, but after that it's only useful as a tool for the developer to structure their databases.

If we planned to expose a getHandleFrom{Id|Uri}() method, then I agree that "URI" might be the right term. But that's explicitly a non-goal of this proposal...

a-sully avatar Aug 20 '22 01:08 a-sully

  1. This repository is not a good place to host explainers I think. At least that's not something we've traditionally done for WHATWG standards.
  2. If for some reason we wanted this field to be a URL, we shouldn't call it URI: https://url.spec.whatwg.org/#url-apis-elsewhere.

annevk avatar Aug 29 '22 12:08 annevk

  1. This repository is not a good place to host explainers I think. At least that's not something we've traditionally done for WHATWG standards.

Happy to move the explainer elsewhere. Where would you prefer it be hosted?

  1. If for some reason we wanted this field to be a URL, we shouldn't call it URI: https://url.spec.whatwg.org/#url-apis-elsewhere.

Since this method provides a completely opaque identifier, I would assume we don't want it called URL or URI? (which to me implies use for fetching)

a-sully avatar Aug 29 '22 16:08 a-sully

I'm not sure. Maybe @domenic has thoughts on this. Personal repository seems fine to me. Eventually the examples and specification text should become part of the specification through a PR.

(As for URL vs ID, I was mainly replying to the earlier conversation. I'm not sure I see the use for a URL either.)

annevk avatar Aug 31 '22 07:08 annevk

I personally think explainers are fine in WHATWG repos, e.g. we have them in whatwg/streams. They can be a crutch, however, leading to people avoiding putting the useful text into the spec itself; so please do avoid that failure mode.

IMO it should be up to the repo editors to decide on things like that.

domenic avatar Aug 31 '22 08:08 domenic

So... "IDs should not be stable after clearing browsing data. In other words, this should not be a truly persistent, unclearable identifier"

But hashing the output of resolve() provides a unique identifier already (a full path to the root of the OPFS store), and would remain the same regardless of clearing.

We should be clear about what we're buying with this, and what it actually accomplishes. Obfuscating the hash value here doesn't actually add any security/privacy, at least in the OPFS case. Adding restrictions that don't actually help users just complicates the implementations with no benefit. Are there any risks here?

Also, if we were to expose our internal IDs (with or without hashing/salting/etc) for objects, that would lock us into the existing ID scheme since these need to be stable; to be safe we'd implement an entirely independent ID scheme which could be guaranteed to be stable for all time (or we'd have to blow away older OPFS files on an internal ID scheme change). But that's more complexity yet (though we could defer the complexity there until we need to do the first change to the ID scheme, if ever)

jesup avatar Aug 31 '22 18:08 jesup

Expose our internal IDs

If at all possible, I would recommend that the standard require IDs to be returned as v4 (random) UUIDs, which act as a facade over any internal IDs. This guarantees a cross-browser-compatible format which developers can't take any dependency on.

domenic avatar Aug 31 '22 23:08 domenic

You're correct that in the OPFS case this doesn't add anything. The scenarios where this does matter are for files outside OPFS, where this adds significant new capabilities in a privacy-preserving way (i.e. without exposing the full file path).

So... "IDs should not be stable after clearing browsing data. In other words, this should not be a truly persistent, unclearable identifier"

But hashing the output of resolve() provides a unique identifier already (a full path to the root of the OPFS store), and would remain the same regardless of clearing.

As mentioned in the explainer, it's currently not possible to distinguish between two files of the same name in different directories for files outside of the OPFS (since the site may not have access to a shared ancestor directory).

We should be clear about what we're buying with this, and what it actually accomplishes. Obfuscating the hash value here doesn't actually add any security/privacy, at least in the OPFS case. Adding restrictions that don't actually help users just complicates the implementations with no benefit. Are there any risks here? Also, if we were to expose our internal IDs (with or without hashing/salting/etc) for objects, that would lock us into the existing ID scheme since these need to be stable; to be safe we'd implement an entirely independent ID scheme which could be guaranteed to be stable for all time (or we'd have to blow away older OPFS files on an internal ID scheme change). But that's more complexity yet (though we could defer the complexity there until we need to do the first change to the ID scheme, if ever)

The idea is that the hash is based on things that are stable: the handle type and path, the origin, and some salt that hangs around until the user clears site data. The biggest risk I see is that some future change (something analogous to the Origin -> StorageKey migration, say) would necessitate changing the ID to incorporate this new information... but users can invalidate the ID by clearing site data anyways. For sites using the ID in a client-side database, this is behavior they'll have to be resilient against regardless. So this doesn't feel like a huge risk?

Your concerns do make sense. That said, we do have a need for this for outside-OPFS scenarios. I appreciate your thinking through this with pragmatic suggestions like deferring the complexity until we do need to change the scheme. But before we commit to anything, do you have alternate suggestions?

a-sully avatar Sep 01 '22 23:09 a-sully

If at all possible, I would recommend that the standard require IDs to be returned as v4 (random) UUIDs, which act as a facade over any internal IDs. This guarantees a cross-browser-compatible format which developers can't take any dependency on.

Seems reasonable to me. I'll add this to the explainer soon if no one objects

a-sully avatar Sep 01 '22 23:09 a-sully

random UUIDs could help avoid accidental exposure of info, and as mentioned avoid accidental dependence on a particular hashing format by developers using this.

Does this buy any advantage to spec and implement for OPFS? As you say,

You're correct that in the OPFS case this doesn't add anything. The scenarios where this does matter are for files outside OPFS, where this adds significant new capabilities in a privacy-preserving way (i.e. without exposing the full file path).

So what is the reasoning to add this to OPFS?

Is it "developers can assume getUniqueId() is available to use whether using OPFS or WICG FS Access?" Does that usecase matter to developers?

jesup avatar Sep 30 '22 16:09 jesup

Thinking about this a bit more, I do think there's value to this method within the OPFS. At least there might be, depending on what we decide getUniqueId() returns a unique ID to. As mentioned here https://github.com/whatwg/fs/issues/39#issuecomment-1248670783, it's unspecified at this point whether a FileSystemHandle is conceptually a path or a reference...

If it's just a path, getUniqueId() provides no value within the OPFS, since calling resolve() on the root directory generates a unique ID (the file path).

If it's a reference, getUniqueId() could track a file even it's moved.

Is it "developers can assume getUniqueId() is available to use whether using OPFS or WICG FS Access?" Does that usecase matter to developers?

I can't speak for all developers, but I would imagine this does. At the very least it would be surprising if the method was only available for handles outside of the OPFS. By adding the getUniqueId() to the WICG spec, the method would be available on all handles, including those in the OPFS.

a-sully avatar Oct 04 '22 18:10 a-sully