spec
spec copied to clipboard
Case sensitivity of path/document names
The question came up in https://github.com/remotestorage/remotestorage.js/pull/1179 and I haven't found any mention of it in the spec. Has this been discussed before?
We should definitely mention it, because there is an expectation that URLs might be case-insensitive.
E.g. on github they partially are:
- https://github.com/remotestorage/remotestorAge.js/blob/master/doc/contributing.rst works
- https://github.com/remotestorage/remotestorage.js/blob/master/doc/contriButing.rst does not
- domain names are case-insensitive
As a programmer, I lean towards saying the URL should be case-sensitive, because in most programming languages string literals are case-sensitive. But I think I could be persuaded either way.
If we want to work with providers who are using various filesystems there is a bit of a problem.
| remotestorage choice | case sensitive backend | case preserving backend | case folding backend |
|---|---|---|---|
| case sensitive | trivial | hard1 | easy 2 |
| case preserving | hard3 | trivial | easy2 |
| case folding | hard3 | possible | trivial |
- When looking up files you need to check for collisions. This can be significantly more expensive than a case-sensitive implementation would be.
- When making changes you need to downcase every filename.
- You need to find out how to store files that vary only by case. This can be done by encoding the filename. (For example a base32 encoding with only lowercase letters.
Looking at it this way the optimal solution for implementers is case-folding. However this has a bunch of problems for applications and users.
- Users may expect to be able to store things that vary only by case. (Especially languages with non-bijective folding rules)
- At this point you are basically required to do full unicode folding[citation needed] which makes everything more difficult. (But you probably need this anyways unless you are treating paths as bytestrings)
- Many applications will now need to store the case some other way (if they want to use the human names in the storage path).
With those things considered I think we should treat paths as byte strings. This makes it easy for the servers to make fast, accurate implementations of remotestorage. However it does mean that it passes folding and normalization onto the app developers. However I think that can be fixed with a couple of good libraries and will be a lot less painful to fix than tracking down a couple of remotestorage implementations that do folding wrong (or just use an older Unicode standard).
There are downsides though:
- Harder to do a trivial filesystem-based implementation if your filesystem isn't case-sensitive.
- Dropbox and other shims might not work? I don't know what normalization if any they do for the case-preserving name.
We should probably also check with major remotestorage providers to see what they do and if it would be hard for them to migrate/support the standardized way.