Defer merging of common stanzas
This is the second part of the exact print parser. We changed the parser so that it doesn't merge the common stanzas during the parsing stage.
Depends on #11285 to update the test suite (after which the diff will be significantly smaller).
Background
Common stanzas are merged into their corresponding section during the parsing stage. This merging process doesn't retain the definition of the imports (i.e. common <name> ...) nor where they are used (import: <name>). To achieve exact printing and to be able to programmatically update the import list, we need to retain this information.
This PR is to remedy exactly that by deferring the merging process.
Implementation
We insert a new field gpdCommonStanza :: Map ImportName (CondTree ConfVar [Dependency] BuildInfo) in the GenericPackageDescription data type to track all the defined common stanzas. The merging is run only when the new accessors are called on a GenericPackageDescription.
We also added a bidirectional PatternSynonym that uses the old record syntax. The impact on existing code should be very low. Comparing against my PR to update the test suite is promising https://github.com/leana8959/cabal/compare/accessor-tests...leana8959:cabal:no-flatten-package-description.
Please let me know your thoughts :)
Checklist below:
This PR modifies behaviour or interface
Include the following checklist in your PR:
- [x] Patches conform to the coding conventions.
- [ ] Any changes that could be relevant to users have been recorded in the changelog.
- [ ] Is the change significant? If so, remember to add
significance: significantin the changelog file.
- [ ] Is the change significant? If so, remember to add
- [ ] The documentation has been updated, if necessary.
- [ ] Manual QA notes have been included.
- [ ] Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)
Here are the benchmark results by timing the ./validate --partial-hackage-tests test.
-
baseline (master branch)
$ hyperfine --setup "./validate.sh" "./validate.sh --partial-hackage-tests" Benchmark 1: ./validate.sh --partial-hackage-tests Time (mean ± σ): 224.440 s ± 5.713 s [User: 191.896 s, System: 34.800 s] Range (min … max): 218.596 s … 239.915 s 10 runs -
no-flatten-package-description
$ hyperfine --setup "./validate.sh" "./validate.sh --partial-hackage-tests" Benchmark 1: ./validate.sh --partial-hackage-tests Time (mean ± σ): 232.745 s ± 10.650 s [User: 194.793 s, System: 38.621 s] Range (min … max): 220.815 s … 254.096 s 10 runs
I'm not sure why validate is used for benchmarking parser changes. Validate is likely dominated by completely unrelated factors (e. g. solver). Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?
@ulysses4ever
Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?
The options --{partial,complete}-hackage-tests call the hackage test. These tests runs the readField and parsec parser against entire hackage. I'll look into clc-hackage as well, but it seems like hackage-tests is already exactly what we want here?
Validate is likely dominated by completely unrelated factors (e. g. solver)
I thought it would be important to test out whether I broke the solver by changing the representation to GPD too, which is why I tested everything :)
The options --{partial,complete}-hackage-tests call the hackage test.
these are tests, which is different from benchmarking. You seem to be using validate for benchmarking, which makes no sense to me, as explained above. On the other hand, using validate for tests may have some value although it is preferable to use the actual test like hackage-tests, which you do, so I don't have any issue with your approach to testing.
Would it be possible to just download a bunch of cabal files from Hackage (see clc-stackage for a curated set of packages) and run only the parser on them?
This is pretty much what the hackage-test test does. Do you think timing this alone would be more acceptable (compare to ./validate) even though it's not a benchmark?
If timing hackage-tests would mean timing download times too, then no, it doesn't sound right. I don't know if you can arrange for separate download, which would make it more reasonable. Generally, if you want to have an estimate for performance footprint, you have to develop a benchmark, you can't just hijack test executables whole-sale. You could probably reuse some of the testing infrastructure (tasty-bench exists for a reason). If this sounds too much, you should just stop trying to achieve it because currently you're posting numbers for validate that most likely have no connection to the actual performance change, and as such are actively misleading. It'd be great to have some very simple benchmark that shows that we don't degrade performance significantly but other than that we should focus on design rather than performance, I think: the design issue has been a die-hard for a long time...
If timing hackage-tests would mean timing download times too, then no, it doesn't sound right. I don't know if you can arrange for separate download, which would make it more reasonable.
hackage-tests looks for $XDG_CACHE/cabal/packages/hackage.haskell.org/01-index.tar and reads from it. It doesn't do the downloading (it would fail if the file doesn't exist and tell the user to run cabal update).
If this sounds too much, you should just stop trying to achieve it because currently you're posting numbers for validate that most likely have no connection to the actual performance change, and as such are actively misleading.
I totally understand. I can make sure cabal update is called before I invoke the test suite, does this make it more reasonable? It seems like the hackage test suite is doing what we want, which is parsing a huge collection of cabal files. I can also specify it to only use the read-fields (envelope cabal format) parser or the parsec (the actual cabal grammar) parser. If not, I'll not test the performance with hackage-tests anymore :+1:
Otherwise, do you have an example of using tasty-bench to benchmark something similar in cabal? We wanted to add benchmark because in the tech proposal phase people were concerned about performance.
How does the bidirectional pattern synonym work? If the accessor flattens the structure then the setter won't know whether to update the common stanza or the specific component. This seems like a big foot-gun in the API if the goal is to allow people to manipulate GenericPackageDescription. How much are these pattern synonyms used for setting fields? Should we rewrite those situations to work properly with common stanzas?
It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.
How does the bidirectional pattern synonym work?
The part before the where keyword defines how the synonym expands in patterns, the part after where defines the expansion in expressions. Having both defined allows us to use the pattern fields identical to a normal record field [^1].
We use the PatternSynonyms extension in conjunction with ViewPatterns to make function calls to do the merging, and the merging is done with the same logic prior this patch.
The pattern synonym was introduced to maintain backward compatibility of the record syntax. We managed to not change the majority of the existing code (except traversals that touch the condtrees for example). Without the record syntax pattern synonym, one would need to pattern match on the entire GenericPackageDescription to change a single field, which is not ergonomic.
If the accessor flattens the structure then the setter won't know whether to update the common stanza or the specific component.
Either the accessors or the caller of the accessors has to decide whether to update the common stanza or the specific component, this is a hard problem to solve. Should we choose to not flatten the structure, the call-sites of the accessors would then have to decide where to make the modification to the data (e.g. the common stanza or the component). This modification must also not create missing imports in any component. With the pattern synonym, all imports are merged on the fly so that the accessor call-sites don't have to deal with the implementation details of common stanza resolution.
This is done by
- Merging the imports on the fly using the first part of the pattern synonym.
- Construct the data structure using the second part of the pattern synonym.
c.f. semantics of accessors using bidirectional PatternSynonyms [^1]
Further more, the first part and the second part of an explicitly bidirectional pattern synonym must have the same type. If we want to be able to construct GenericPackageDescription with import information in the second part (i.e. not flatten the condtrees), the first part must not merge them, which renders the whole "compatibility accessors" idea futile.
We could also choose to not have a right hand side, but due to how pattern synonym works [^1], we won't able to use these fields as setters, only accessors. This is also a valid option.
As a side note, PatternSynonyms would give us a new set of "compatibility" accessors that doesn't work together with the old ones. This can also be a footgun (or just plain annoying) for other people working on cabal.
The following snippet demonstrates that.
-- This doesn't work
foo
{ internalField = bar -- a field from the internal description
, compatibilityField = baz -- a field from the new "compatibility" pattern
}
-- We have to do this because these two fields don't come from the same pattern.
foo
{ internalField = bar -- a field from the internal description
}
{ compatibilityField = baz -- a field from the new "compatibility" pattern
}
What do you think about this trade off for backward compatibilty? Are there different solutions/details that I might be missing?
How much are these pattern synonyms used for setting fields?
I grepped with \{ <gpdFieldName> for each field name of the gpd pattern synonym and I only found
three. They are all generating dummy gpd, either for partial parsing failure or backward
compatibility functions or tests.
It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.
I will come back later for this :)
[^1]: Pattern synonym in Record Syntax https://gitlab.haskell.org/ghc/ghc/-/wikis/pattern-synonyms/record-pattern-synonyms#record-updates
It would be helpful if you could lay out some comments pointing out how to review the patch since it is rather large.
We annotate each node of a CondTree's data with its imports, and not merge in the imports as we go. Notably, the gpdCommonStanza field holds a map of WithImports BuildInfos indexed by common stanza name and each CondTree ConfVar [Dependency] a is now CondTree ConfVar [Dependency] (WithImports a). This representation can capture recursive imports within nested conditionals or transitive imports. During import resolution, we first resolve the imports within common stanza, and then resolve the imports for each component.
In the parser test common-conditional you can compare the cabal file with its parsed expression and see this. https://github.com/haskell/cabal/blob/6ddbcc40f18d2732878c0fe2c726196a7b7da890/Cabal-tests/tests/ParserTests/regressions/common-conditional.cabal https://github.com/haskell/cabal/blob/6ddbcc40f18d2732878c0fe2c726196a7b7da890/Cabal-tests/tests/ParserTests/regressions/common-conditional.expr
The merging logic is done in Cabal-syntax/src/Distribution/Types/GenericPackageDescription.hs in mergeImports. The specific merging function for each field is named condLibrary', condSubLibraries', and so on.
Previously, merging was done in Cabal-syntax/src/Distribution/PackageDescription/Parsec.hs by having common stanza map as a state, adding entries and merging from it as we proceed in the parsing. This common stanza map is essentially the same as the one in this patch, we only defer the merging process.
The types TestSuite has a parsing internal representation TestSuiteStanza. Upon validating the TestSuiteStanza configuration, it can then be turned into a TestSuite. Previously it was validated and converted in one go during parsing. We have split the function to do the conversion and the validation seperately. This type now lives in Cabal-syntax/src/Distribution/Types/TestSuiteStanza.hs.
To retrieve a CondTree ConfVar [Dependency] TestSuite (merged, validated and converted representation), we would have to first merge the imports, and then convert it. It doesn't make sense to validate a incomplete TestSuiteStanza without its imports. We assume the conversion is valid because we do the validation during parsing, other code that want to manipulate TestSuiteStanza after parsing should validate again before getting the CondTree ConfVar [Dependency] TestSuite.
This kind of internal representation is also true for Benchmark and its counterpart BenchmarkStanza.
Let me know if I can elaborate on some details!
I downloaded all latest packages on hackage using nh2/hackage-download and grepped to see how much downstream code is using GenericPackageDescription accessors. There are 60 occurrences.
You can also view it here using serokell's tool, though I don't know how exhaustive it is.
The regex pattern I used: \{[ \n]*(packageDescription|gpdScannedVersion|genPackageFlags|condLibrary|condSubLibraries|condForeignLibs|condExecutables|condTestSuites|condBenchmarksUnmerged)[ \n]*=. The = sign excludes usages of gpd fields as getter.
60 occurrences of accessor usages in latest hackage
./autopack-0.0.0.0/src/Autopack.hs: { condLibrary = condLibrary gPackDescr <&> \condLib -> condLib
./LibClang-3.8.0/Setup.hs: gpd' = gpd { condLibrary = Just (cdt { condTreeData = lib' }) }
./cabal2nix-2.20.1/src/Distribution/Nixpkgs/Haskell/PackageSourceSpec.hs:setCabalFileHash sha256 gpd = gpd { packageDescription = (packageDescription gpd) {
./cabal-doctest-1.0.12/src/Distribution/Extra/Doctest.hs: { condTestSuites = map f (condTestSuites gpd)
./cabal-edit-0.1.0.0/exe/Main.hs:setBuildInfo binfo pkg@GenericPackageDescription {..} = pkg {condLibrary = fmap go condLibrary}
./cabal-edit-0.1.0.0/exe/Main.hs:setDeps deps pkg@GenericPackageDescription {..} = pkg {condLibrary = fmap go condLibrary}
./constraints-deriving-1.1.1.2/Setup.hs:addReexportsGPD gpd = gpd { condLibrary = addReexportsCT <$> condLibrary gpd }
./Cabal-3.16.0.0/src/Distribution/PackageDescription/Check.hs: { packageDescription = pd
./Cabal-ide-backend-1.23.0.0/Distribution/Simple/Configure.hs: pkg_descr0'' = pkg_descr0 { condTestSuites = flaggedTests
./Cabal-syntax-3.16.0.0/src/Distribution/PackageDescription/PrettyPrint.hs: { packageDescription = pd
./cabal-install-3.16.0.0/tests/IntegrationTests2.hs: { packageDescription = emptyPackageDescription{package = pkgid}
./cabal-install-3.16.0.0/src/Distribution/Client/IndexUtils.hs: { packageDescription =
./base-compat-migrate-0.2.0.1/Generate.hs: { condLibrary =
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/UpdatePackage.hs:updatePackage_packageDescription = Lens.lens (\UpdatePackage' {packageDescription} -> packageDescription) (\s@UpdatePackage' {} a -> s {packageDescription = a} :: UpdatePackage)
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/Types/PackageDetails.hs:packageDetails_packageDescription = Lens.lens (\PackageDetails' {packageDescription} -> packageDescription) (\s@PackageDetails' {} a -> s {packageDescription = a} :: PackageDetails)
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/CreatePackage.hs: { packageDescription =
./amazonka-elasticsearch-2.0/gen/Amazonka/ElasticSearch/CreatePackage.hs:createPackage_packageDescription = Lens.lens (\CreatePackage' {packageDescription} -> packageDescription) (\s@CreatePackage' {} a -> s {packageDescription = a} :: CreatePackage)
./cabal-add-0.2/src/Distribution/Client/Add.hs: Right GenericPackageDescription {packageDescription = PackageDescription {specVersion = CabalSpecV1_0}} ->
./cabal-auto-expose-0.1.0.1/src/Distribution/Simple/AutoExpose.hs: gpd { condLibrary = fmap (updateCondTreeLib exposedLib) (condLibrary gpd)
./dist-upload-0.0.4/Main.hs: {packageDescription =
./stackage2nix-0.7.2/src/AllCabalHashes.hs: { packageDescription = (packageDescription cabal)
./stackage2nix-0.7.2/src/Distribution/Nixpkgs/Haskell/FromStack.hs:removeTests gd = gd { condTestSuites = [] }
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/UpdatePackage.hs:updatePackage_packageDescription = Lens.lens (\UpdatePackage' {packageDescription} -> packageDescription) (\s@UpdatePackage' {} a -> s {packageDescription = a} :: UpdatePackage)
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/Types/PackageDetails.hs:packageDetails_packageDescription = Lens.lens (\PackageDetails' {packageDescription} -> packageDescription) (\s@PackageDetails' {} a -> s {packageDescription = a} :: PackageDetails)
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/CreatePackage.hs: { packageDescription =
./amazonka-opensearch-2.0/gen/Amazonka/OpenSearch/CreatePackage.hs:createPackage_packageDescription = Lens.lens (\CreatePackage' {packageDescription} -> packageDescription) (\s@CreatePackage' {} a -> s {packageDescription = a} :: CreatePackage)
./ide-backend-0.10.0.1/IdeSession/Cabal.hs: { packageDescription = pkgDesc
./ide-backend-0.10.0.1/IdeSession/Cabal.hs: { packageDescription = pkgDesc
./ide-backend-0.10.0.1/IdeSession/Cabal.hs: { packageDescription = pkgDescFromName libName version
./hpack-0.39.0/test/Hpack/RenderSpec.hs: renderPackage_ package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-0.39.0/test/Hpack/RenderSpec.hs: renderPackageWith defaultRenderSettings 16 [] [] package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-convert-1.0.1/test/Hpack/RunSpec.hs: renderPackage_ package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hpack-convert-1.0.1/test/Hpack/RunSpec.hs: renderPackage defaultRenderSettings 16 [] [] package {packageDescription = Just "foo\n\nbar\n"} `shouldBe` unlines [
./hackport-0.9.1.0/src/Merge/Dependencies.hs:mkRetroPD pd = RetroPackageDescription { packageDescription = pd, buildDepends = exeAndLibDeps pd }
./hackport-0.9.1.0/cabal/Cabal-syntax/src/Distribution/PackageDescription/PrettyPrint.hs: { packageDescription = pd
./hackport-0.9.1.0/cabal/cabal-install/src/Distribution/Client/IndexUtils.hs: { packageDescription = emptyPackageDescription
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: [ ("packageDescription", \gpd -> gpd { packageDescription = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("genPackageFlags", \gpd -> gpd { genPackageFlags = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("condLibrary", \gpd -> gpd { condLibrary = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("condSubLibraries", \gpd -> gpd { condSubLibraries = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("condForeignLibs", \gpd -> gpd { condForeignLibs = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("condExecutables", \gpd -> gpd { condExecutables = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/UnitTests/Distribution/Types/GenericPackageDescription.hs: , ("condTestSuites", \gpd -> gpd { condTestSuites = undefined })
./hackport-0.9.1.0/cabal/Cabal-tests/tests/custom-setup/CabalDoctestSetup.hs: { condTestSuites = map f (condTestSuites gpd)
./haskell-gi-0.26.17/lib/Data/GI/CodeGen/CabalHooks.hs: gpd' = gpd {condLibrary = Just cL'}
./haskell-gi-0.26.17/lib/Data/GI/CodeGen/CabalHooks.hs: gpd' = gpd {condLibrary = Just cL'}
./jailbreak-cabal-1.4.1/Main.hs:stripVersionRestrictions pkg = pkg { condLibrary = fmap relaxLibraryTree (condLibrary pkg)
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:getSrcDirs (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) =
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:ghcOpts (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) =
./gpah-0.0.2/src/Generics/GPAH/Utils.hs:cppOpts (GenericPackageDescription {condLibrary = l, condExecutables = es, condTestSuites = ts}) =
./g2-0.2.0.0/src/G2/Translation/Cabal/Cabal.hs: { condLibrary = cl
./exherbo-cabal-0.2.1.1/src/Main.hs: descr' = descr { packageDescription = pkgDescr' }
./hackage-proxy-0.3.0.1/TweakCabal.hs: { condLibrary = tweakCondTree <$> condLibrary gpd
./leksah-0.16.2.2/src/IDE/Pane/PackageEditor.hs: { packageDescription = pd{
./leksah-0.16.2.2/src/IDE/ImportTool.hs: gpd { condLibrary = addDepToLib d (condLibrary gpd),
./leksah-0.16.2.2/src/IDE/Package.hs: addModule LibExposedMod gpd@GenericPackageDescription{condLibrary = Just lib} =
./leksah-0.16.2.2/src/IDE/Package.hs: gpd {condLibrary = Just (addModToLib moduleName lib)}
./leksah-0.16.2.2/src/IDE/Package.hs: addModule LibOtherMod gpd@GenericPackageDescription{condLibrary = Just lib} =
./leksah-0.16.2.2/src/IDE/Package.hs: gpd {condLibrary = Just (addModToBuildInfoLib moduleName lib)}
./voyeur-0.1.0.1/Setup.hs:onPkgDescr f gpd = gpd { packageDescription = f (packageDescription gpd) }