io
io copied to clipboard
`Hash.apply` provides the same hash for all directories
See here.
When a file is a directory, you get a FileNotFoundException
when instantiating a FileInputStream
with it. This makes it harder to write SBT tasks that need to invalidate some cache upon a file being added/removed from a directory.
Not sure if I have the full context on what situation would require directory hash, but as a reference Bazel also behaves in a similar way wherein if you use directory as input or output you get a warning saying "we're not gonna invalidate correctly for directory" or something along the lines.
See also https://www.scala-sbt.org/1.x/docs/Caching.html#Tracking+file+attributes and other example regarding caching.
Sorry for the late reply.
Not sure if I have the full context on what situation would require directory hash.
There's various reasons why you'd want it. Smithy4s, a library I maintain, abstracts the business logic away from SBT, as we also provide a CLI and a Mill plugin. Therefore, we essentially have a CodegenArgs => Seq[GeneratedFiles]
function that build tools can delegate to.
Now the thing is that we want to use whole CodegenArgs
datatype as a a cache key, because a change in any of the data it contains may impact the code rendering, and therefore any change should invalidate the cache. This CodegenArgs
datatype contains a bunch of things, including reference to files that may be indiscriminately files or folders. Because it doesn't JUST contain files, FileFunction
doesn't cut it for us.
We naively thought files via the functions provided by SBT would hash folders, like it happens in Mill's PathRef, turns out it wasn't the case and we had to take action, after a couple hours of debugging what was very puzzling behaviour.
Now whether Mill's way of hashing folders is actually desirable in SBT is not my place to say, but silently accepting folders and recovering by hashing an empty string is definitely not great, as it very much gives a false-positive behaviour. I think throwing an exception would be better, as at least the implementor could be made aware (via testing) of the problem, and take action. Logging a warning would be also acceptable, though easier to miss.