cabal icon indicating copy to clipboard operation
cabal copied to clipboard

Make Cabal agnostic about working directory

Open sheaf opened this issue 1 year ago • 4 comments

Template Α: This PR modifies cabal behaviour

This PR tackles #9702, making the library functions in Cabal agnostic to the working directory. In practice, this means that we distinguish FilePaths from un-interpreted SymbolicPaths. The latter may be paths that are relative to the working directory, and need to be interpreted with respect to a passed-in argument specifying the working directory, instead of using the working directory of the current process. See Note [Symbolic paths] in Distribution.Utils.Path.

In particular, paths in the package description now use the SymbolicPath abstraction, which allows specifying whether they are allowed to be absolute, and, if they are relative, what they are relative to. For example, source files are relative to a source search directory, data files are relative to the data directory, and doc files are relative to the package root.

Review notes

For review, I would recommend starting by reading the notes and comments in Distribution.Utils.Path.

The main goal is to not set the working directory when calling the Cabal library internally when building a package in cabal-install, and instead passing it separately (currently via a global flag to the Setup executable). To enable this, the entire Cabal library needs to be able to interpret file paths with respect to a passed-in working directory, instead of using the working directory of the current process. This means all interactions with the file system (e.g. doesFileExist, rewriteFile etc), as well as all invocations of external processes (such as ghc) need to be scrutinised for correctness.


TODO:

  • [x] More extensive testing, e.g. compiling stackage.

QA notes

The testing strategy for this patch is simply to build, test & benchmark packages. The main difference is that we no longer set the working directory when invoking the Cabal library internally (see internalSetupMethod), relying instead on passing -working-dir.

sheaf avatar Feb 16 '24 12:02 sheaf

I have compiled clc-stackage using this branch. It showed up a single failure which is unrelated to this patch, which I filed separately as #9777 .

I now consider this MR ready for review.

sheaf avatar Mar 04 '24 14:03 sheaf

I think this patch is a very good improvement to the cabal codebase, and I believe in its correctness additionally because of the robust testing strategy.

I think it is ready.

@andreabedini would you care to comment/review this patch, so we can proceed swiftly to merging before it starts bitrotting?

alt-romes avatar Mar 12 '24 15:03 alt-romes

It might make sense to split it into multiple reviewers by chunks, since it's an awfully big gulp for a single reviewer.

geekosaur avatar Mar 12 '24 15:03 geekosaur

It might make sense to split it into multiple reviewers by chunks, since it's an awfully big gulp for a single reviewer.

I understand why you say that, but the way I think of this patch is that it boils down to changing every direct interaction with the filesystem (e.g. doesDirectoryExist, createFile etc) to account for a working directory argument. I don't think it's possible to audit every such change, which is why I focused on testing the patch extensively. Instead, I think a review should focus on the overall new design and direction of travel, without necessarily looking at every changed file.

sheaf avatar Mar 12 '24 16:03 sheaf

Sorry for the extended radio silence. I am not going to read this line by line but I want to commend @sheaf for the great and detailed work.

I have to admit I feel slightly dissatisfied by AllowAbsolute, I feel the information on whether the path is (or can be) absolute should go along with from :: Symbol. I do not understand what Dir "Package" tells me in presence of AllowAbsolute?

Does the type SymbolicPathX AllowAbsolute (Dir "Package") (File "Source") mean that the file path can be either:

  • relative to "Package", e.g. src/Utils/Path.hs
  • absolute and extending inside "Package" e.g /home/user/workdir/cabal/Cabal-syntax/src/Utils/Path.hs ?

If I have

sp = SymbolicPath "/some/directory/Cabal-syntax/Cabal-syntax.cabal" :: SymbolicPathX AllowAbsolute (Dir "Package") (File ".cabal")`

then (for all to)

takeDirectory sp :: SymbolicPathX AllowAbsolute (Dir "Package") (Dir to)

Maybe it could be Dir Unknown instead? (We could consider that the cabal file is always in the package root directory but that's another story).

Is there any chance we end up going past the package directory and produce something like the following?

SymbolicPath "/some/directory" :: SymbolicPathX AllowAbsolute (Dir "Package") (Dir _)`

Semantically speaking AllowAbsolute subsumes Reative so I would find more natural to use polymorphism, i.e.

newtype SymbolicPath (root :: k) (to :: FileOrDir) = SymbolicPath FilePath
data Absolute
data RelativeTo (what :: Symbol)

SymbolicPath "src/Utils/Path.hs" :: SymbolicPath (RelativeTo "Package") (File "Source")

This would be similar to the design in hackage-security.

andreabedini avatar Mar 18 '24 09:03 andreabedini

This would be similar to the design in hackage-security.

@andreabedini Thanks for the pointer, I wasn't aware that hackage-security vendored its own Path abstraction.

As for your comment about keeping track of whether a path is absolute or relative in the type: the most common use-case within Cabal is of a path that is either absolute or relative to a particular directory. For example, source files generated by preprocessors end up in the build directory, which can well be outside of the current package directory; see e.g. preprocessFile. This means that, for example, Haskell source search directories are "either absolute or relative to the package". I opted for a design that reduced friction for this particular-use case; a design that kept track of whether a path is absolute or relative in the type would introduce a lot of existentials, wrapping/unwrapping, RankNTypes, and so on.

Is there any chance we end up going past the package directory [...]

Yes, and in fact I notice this is the same as hackage-security. I don't propose to change the API to prevent this problem; I guess you would keep track of how many levels deep something is in the type, takeDirectory would only work if you are >= 1 levels deep and decrease the level by 1. That doesn't seem worth it, I don't think there are any bugs surrounding that currently.

sheaf avatar Mar 25 '24 12:03 sheaf

To reiterate, the concept that seems most useful, at least for the purposes of this working directory change, is the notion of a filepath which, if it is relative, is relative to the package. This is what the SymbolicPath from to type definition provides.

An existential such as

data MyPath to where
  RelPath :: RelativePath from to -> MyPath to
  AbsPath :: AbsolutePath to -> MyPath to

loses this information: if I have a MyPath to in hand, and it ends up being a relative path, I have forgotten what it was relative to.

sheaf avatar Mar 25 '24 12:03 sheaf

I have updated the SymbolicPath API thusly:

  • we no longer distinguish different kinds of files (in practice it didn't seem to catch any bugs, and made some things a bit tedious),
  • directories are back to using datatypes instead of Symbol,

I believe these changes make the API easier to use and should address @fendor's review. It means we no longer need a naming convention or worry about capitalisation, and the individual abstract directories now have appropriate Haddock documentation.

I have not made any changes involving AllowAbsolute, as I believe the current design is more appropriate for the needs of Cabal.

I am currently re-building stackage as a safety check. Edit: success.

sheaf avatar Mar 26 '24 13:03 sheaf