squeal
squeal copied to clipboard
Out of memory on a big query
I'm trying to compile a query returning 37 fields, using 14 joins, 1 single where clause, a group by on 33 fields, and a order by on 4 fields. Sadly, I get "unable to commit 745537536 bytes of memory" when ghc is trying to compile the module containing the query. (I cannot post the query for IP reasons, sorry)
Do you have any idea of what I could do to help the compiler on this ?
Oh no :-( that’s not good. I tried Googling the error message but nothing useful came out. You could try putting the query alone in its own module or somehow giving GHC more memory (swap space?) to work with. Squeal uses type level lists which are quite inefficient when calculating Join and Has and the rest. At runtime all that inefficiency should completely go away but compile time is a different story. If I had an equivalent example I could investigate more thoroughly.
Putting the query alone in its own module doesn't seem to make much of a difference. I still have to check if splitting in several functions helps (though it probably shouldn't !). However, a colleague with more experience in GHC suggested I add a pragma on the file containing the query:
{-# OPTIONS_GHC -fno-specialise -fno-full-laziness #-}
It still consumes 3.5 GB but that's already way more manageable.
Some more information about this, thanks to the remarkable investigative work done by @haitlahcen. There are two issues at hands:
- One with Stack and its use of dump-hi files (a non-binary version of GHC's hi), leading to very big files being printed out, with very high memory usage.
- One with GHC systematically unfolding types, which takes a lof of memory for the types we use in Squeal.
The current workaround, rather than the --fno-specialise
and -fno-full-laziness
is to use -fomit-interface-pragmas
but @haitlahcen is doing his best to solve the issues in both Stack and GHC. See his issue here for more information: https://ghc.haskell.org/trac/ghc/ticket/8095#comment:58.
For current Squeal users with problematic compilation time and memory usage, -fomit-interface-pragmas
is probably the best current solution.
Wow! Thanks so much @adfretlink and @haitlahcen ! This is great. Sorry Squeal stresses GHC out so much.
Hey! I've opened an issue for stack as well
Manually unrolling recursive type families should radically improve compile time and memory usage.
Might open a PR today.
Small update on this topic: we've just squashed our migrations, redifining our Schema
as if it was the initial one. We had around ~30 migrations over it. Compilation time for the project went from 40 minutes to 7 ! So there's at least a lead as to the "main culprit" of compilation cost.
Pretty interesting. I wonder what would happen with aggressive use of partial type signatures. If all intermediate schemas are wild-carded _
, and only the initial and final schemas are explicitly typed, I wonder if that would help both from a compilation efficiency perspective and a code cleanliness perspective...
How would we do this ?
Something like:
type Base = -- some schema
type AddATable = Create "myTable" ('Table SomeTable) _
type FinalMig = Alter "myTable" ('Table SomeTableV2) AddATable
But how would GHC be able to fetch the order migrations properly ultimately ?
The way I do it in my projects is I have a directory structure like
Schema.hs
Schema/V0.hs
Schema/V1.hs
Schema/V2.hs
..
where each V{n}.hs
has a SchemasType
called DB
(or Schemas
) and for n > 0
setup :: Definition V{n-1}.DB DB
teardown :: Definition DB V{n-1}.DB
migration :: Migration Definition V{n-1}.DB DB
and Schema.hs
has a
migrations :: AlignedList (Migration Definition) V0.DB V{max}.DB
migrations = V1.migration :>> .. :>> V{max}.migration :>> Done
and re-exports V{max}.DB
.
And every other module imports the DB
from Schema.hs
.
Now, we shouldn't need to define any of the intermediate DB
s between V0.DB
and V{max}.DB
because they should all be inferable and nowhere else referenced. I don't know if that would speed up or slow down or have no effect on compilation time, but it would cut down on some redundancy. I haven't settled on best practice for migrations over time yet. I read this review of Beam's migration system which was pretty negative. Some of the critiques might apply to Squeal as well. I'm a little worried that migrations in Squeal are redundant and cause compilation time issues.
@adfretlink Thank you for documenting the workaround using {-# OPTIONS_GHC -fomit-interface-pragmas #-}
. I had to use this + globally disabling optimizations using stack build --ghc-options='-O0'
to have it pass on the CircleCI free tier (4GB of RAM) without running out of memory.
In case anyone needs a repro, here’s a PR on my open source project that exhibits this problem: https://github.com/zoomhub/zoomhub/pull/158
@echatav Thanks for documenting how you organize your schema migrations. I ended up doing something similar on my own but it’s nice to see it being validated: https://github.com/zoomhub/zoomhub/tree/69f420ee9f2d6b88392cfa2657948e1c2c74db30/src/ZoomHub/Storage/PostgreSQL/Schema