filepath
filepath copied to clipboard
Improve normalise
When reviewing the Shake normalise function, I realised the implementation I had was all wrong. I should backport the appropriate Shake tests to the FilePath library and see if that also needs improvements.
I had a brief look at the tests in https://github.com/ndmitchell/shake/blob/master/Test/FilePath.hs#L33-L62
Notable discrepancy:
Shake.FilePath: norm "a/." === "a"
System.Filepath: normalise "a/." == "a/"
Special handling of trailing dots was added to System.FilePath in 3815668a4ad7f10f864ad39a6231674a7faf8521.
Looking at the trailing slash behaviour of normalise, I'm not certain what the right answer should be. Is going from trailing slash to not too much change for normalise?
Since normalise "a/" == "a/"
, I think normalise "a/."
should also be "a/"
, not "a"
. I don't quite understand the Shake version, maybe it should be changed to keep the trailing slash.
Quite possibly, I'd like the Shake and non-Shake versions to end up equivalent where they overlap - and some changes to the Shake version is fine.
So I think my underlying uneasiness all comes down to what a/.
should be represented as. Essentially, you can chose to preserve the presence/absence of a trailing slash (and go for a
), or preserve knowledge that it's a directory (and go for a/
). What worries me is we don't have a function to test if something appears to be a directory or not, but we do have functions to talk about trailing slashes - perhaps that was a design flaw at the very beginning.
In Shake I jumped for preserving the presence/absence of a trailing slash, in FilePath we seem to be preserving directory information. I'm still trying to figure out in my head which is better.
There are lots of things here that aren't quite right, comparing to the Shake version (which also has a few minor trailing slash and drive issues). A few examples on Windows.normalise
:
-
/////
is\\\\\
(literally, not escaped, so 5 instances of\
) - should probably be\
-
/a:
isa:
, should probably be unchanged, but this breaks the idempotence - sincea:
renormalises toA:
. -
/a:/
isa:/
- should probably be\a:\
.
I think the "right" solution is probably to rethink the whole library to be based around:
data Lexeme = Drive Char | UNC String | Separator Char | Path String
parse :: String -> [Lexeme]
display :: [Lexeme] -> String
display = concatMap $ \x -> case x of
Drive x -> [x,':'] -- x will be an ASCII character
UNC x -> '\\':'\\':x -- x will be a UNC name, not containing any pathSeparators
Separator x -> x -- x will be a member of pathSeparators
Path x -> x -- x will not contain any pathSeparators
Where we rely on the property display . parse == id
. Writing normalise is then pretty trivial, and things like changing combine
as per #8 would become feasible.
Does anyone (e.g. @thomie) think that seem sensible? Or a bad idea? Or too disruptive?
Those are all invalid filepaths:
*System.FilePath.Windows> isValid "/////"
False
*System.FilePath.Windows> isValid "/a:"
False
*System.FilePath.Windows> isValid "/a:/"
False
I also get a different result for this one:
*System.FilePath.Windows> normalise "/////"
"\\\\\\\\\\"
But I like your idea. splitDrive
is particularly brittle at the moment. We could take it even further: reject invalid filepaths from being parsed. Not sure what the implication would be, but can't we make filepath
a wrapper around system-filepath?
By \\\\\
I meant it printed \\\\\\\\\\
- I was removing the escape characters to print it literally.
Since the filepath library is one of the GHC Core libraries it's very important that we don't break reverse compatibility other than for bugs, so we'll have to support invalid paths forever (I personally think that's a good thing, in certain circumstances things which would otherwise be invalid can be useful to do some subset of operations on). We also can't depend on system-filepath, since that isn't a Core library (it also does a few things I'm not a fan of). I'm not suggesting we expose the parse
/display
functions or Lexeme
data type.
By \\ I meant it printed \\\ - I was removing the escape characters to print it literally.
Ok, I was just confused for some reason. Still, it's not a valid filepath, so normalise can do whatever.
Even core libraries can be changed, after an accepted proposal on the libraries list, but I see what you're saying.
It would be nice if the section "Should FilePath by an abstract data type?" from the old readme would come back, with an explanation why invalid paths can sometimes be a good thing.
Agreed, I should resurrect that section, but make it up to date (before it was all about proposals, and before system-filepath existed).
I've updated the README with an abstract filepath section. Work still needed on normalise.
Idea is to do this after #53.
Should C:\\\\
be normalised to C:\
? Windows API (e.g. CreateFile
) seems treat them the same.
@Rufflewind - yes, probably. That said, there are plenty of holes in normalise that could do with being addressed...
> normalise "\\??????????????\\D:\\lol"
"D:\\lol"
> normalise "\\abc\\D:\\lol"
"D:\\lol"
both paths are invalid, yet normalise
makes them valid.
Maybe normalise
should be a no-op for invalid paths.
Should
C:\\\\
be normalised toC:\
? Windows API (e.g.CreateFile
) seems treat them the same.
Just hit this issue today. Could we possibly fix at least this inconsistency, if a larger rewrite is not on cards?
Another helpful option would be at least to document known caveats with existing normalise
.
Normalise is a mess. I'm not even sure what are all the expected properties.
Rewriting it in-place isn't possible. We'll have to rename the function and deprecate the old.