Re-design the ProgramDb abstraction to avoid accidentally dropping unconfigured programs
We have
data ProgramDb = ProgramDb
{ unconfiguredProgs :: UnconfiguredProgs
, progSearchPath :: ProgramSearchPath
, configuredProgs :: ConfiguredProgs
}
-- | Note that this instance does not preserve the known 'Program's.
-- See 'restoreProgramDb' for details.
instance Binary ProgramDb where
put db = do
put (progSearchPath db)
put (configuredProgs db)
get = do
searchpath <- get
progs <- get
return $!
emptyProgramDb
{ progSearchPath = searchpath
, configuredProgs = progs
}
Note here that the Binary instance for ProgramDb discards the unconfigured programs. This is because arbitrary functions can be stored in that field (e.g. programFindVersion :: Verbosity -> FilePath -> IO (Maybe Version)).
However, this easily leads to bugs; #2241 was an example. I ran into another bug today, which was that the configureCompiler function in cabal-install would also drop unconfigured programs, which could lead to e.g. failing to find the ar executable (The program 'ar' is required but it could not be found.).
One first step which could help avoiding bugs is to define a newtype around ProgramDb and attach the Binary instance to that newtype instead. The wrapping/unwrapping of the newtype would hopefully introduce the appropriate amount of ceremony to avoid accidents.
More ambitiously, we could re-design ProgramDb, e.g. by introducing a serialisable DSL for unconfigured programs, or perhaps by using static pointers.
Let me check if I understand. When we cache programdb to disk, we forget about programs that have been added to the database but not configured. And this somehow means we cannot configure them later when we restore the cache?
Let me check if I understand. When we cache programdb to disk, we forget about programs that have been added to the database but not configured. And this somehow means we cannot configure them later when we restore the cache?
Yes, that's correct.
I think there are a lot of bugs lurking around here. For example, in Distribution.Client.ProjectPlanning.rebuildInstallPlan, we end up serialising a ProgramDb to disk for use in caching/recompilation avoidance. This will discard unconfigured programs, which leads to a slew of subtle bugs. I am trying to improve the situation in #9967, but I am just papering over the problem. Really, we should re-design the abstraction entirely to avoid accidentally making use of its lossy Binary instance.