squeal icon indicating copy to clipboard operation
squeal copied to clipboard

Out of memory on a big query

Open Raveline opened this issue 6 years ago • 11 comments

I'm trying to compile a query returning 37 fields, using 14 joins, 1 single where clause, a group by on 33 fields, and a order by on 4 fields. Sadly, I get "unable to commit 745537536 bytes of memory" when ghc is trying to compile the module containing the query. (I cannot post the query for IP reasons, sorry)

Do you have any idea of what I could do to help the compiler on this ?

Raveline avatar Dec 23 '18 10:12 Raveline

Oh no :-( that’s not good. I tried Googling the error message but nothing useful came out. You could try putting the query alone in its own module or somehow giving GHC more memory (swap space?) to work with. Squeal uses type level lists which are quite inefficient when calculating Join and Has and the rest. At runtime all that inefficiency should completely go away but compile time is a different story. If I had an equivalent example I could investigate more thoroughly.

echatav avatar Dec 24 '18 06:12 echatav

Putting the query alone in its own module doesn't seem to make much of a difference. I still have to check if splitting in several functions helps (though it probably shouldn't !). However, a colleague with more experience in GHC suggested I add a pragma on the file containing the query:

{-# OPTIONS_GHC -fno-specialise -fno-full-laziness  #-}

It still consumes 3.5 GB but that's already way more manageable.

Raveline avatar Dec 24 '18 08:12 Raveline

Some more information about this, thanks to the remarkable investigative work done by @haitlahcen. There are two issues at hands:

  • One with Stack and its use of dump-hi files (a non-binary version of GHC's hi), leading to very big files being printed out, with very high memory usage.
  • One with GHC systematically unfolding types, which takes a lof of memory for the types we use in Squeal.

The current workaround, rather than the --fno-specialise and -fno-full-laziness is to use -fomit-interface-pragmas but @haitlahcen is doing his best to solve the issues in both Stack and GHC. See his issue here for more information: https://ghc.haskell.org/trac/ghc/ticket/8095#comment:58.

For current Squeal users with problematic compilation time and memory usage, -fomit-interface-pragmas is probably the best current solution.

adfretlink avatar Jan 25 '19 08:01 adfretlink

Wow! Thanks so much @adfretlink and @haitlahcen ! This is great. Sorry Squeal stresses GHC out so much.

echatav avatar Jan 25 '19 14:01 echatav

Hey! I've opened an issue for stack as well

haitlahcen avatar Jan 25 '19 16:01 haitlahcen

Manually unrolling recursive type families should radically improve compile time and memory usage.

Might open a PR today.

ilyakooo0 avatar Oct 16 '19 13:10 ilyakooo0

Small update on this topic: we've just squashed our migrations, redifining our Schema as if it was the initial one. We had around ~30 migrations over it. Compilation time for the project went from 40 minutes to 7 ! So there's at least a lead as to the "main culprit" of compilation cost.

adfretlink avatar Nov 12 '19 15:11 adfretlink

Pretty interesting. I wonder what would happen with aggressive use of partial type signatures. If all intermediate schemas are wild-carded _, and only the initial and final schemas are explicitly typed, I wonder if that would help both from a compilation efficiency perspective and a code cleanliness perspective...

echatav avatar Nov 14 '19 18:11 echatav

How would we do this ?

Something like:

type Base = -- some schema

type AddATable = Create "myTable" ('Table SomeTable) _

type FinalMig = Alter "myTable" ('Table SomeTableV2) AddATable

But how would GHC be able to fetch the order migrations properly ultimately ?

adfretlink avatar Nov 15 '19 08:11 adfretlink

The way I do it in my projects is I have a directory structure like

Schema.hs
Schema/V0.hs
Schema/V1.hs
Schema/V2.hs
..

where each V{n}.hs has a SchemasType called DB (or Schemas) and for n > 0

setup :: Definition V{n-1}.DB DB
teardown :: Definition DB V{n-1}.DB
migration :: Migration Definition V{n-1}.DB DB

and Schema.hs has a

migrations :: AlignedList (Migration Definition) V0.DB V{max}.DB
migrations = V1.migration :>> .. :>> V{max}.migration :>> Done

and re-exports V{max}.DB. And every other module imports the DB from Schema.hs.

Now, we shouldn't need to define any of the intermediate DBs between V0.DB and V{max}.DB because they should all be inferable and nowhere else referenced. I don't know if that would speed up or slow down or have no effect on compilation time, but it would cut down on some redundancy. I haven't settled on best practice for migrations over time yet. I read this review of Beam's migration system which was pretty negative. Some of the critiques might apply to Squeal as well. I'm a little worried that migrations in Squeal are redundant and cause compilation time issues.

echatav avatar Nov 22 '19 19:11 echatav

@adfretlink Thank you for documenting the workaround using {-# OPTIONS_GHC -fomit-interface-pragmas #-}. I had to use this + globally disabling optimizations using stack build --ghc-options='-O0' to have it pass on the CircleCI free tier (4GB of RAM) without running out of memory.

In case anyone needs a repro, here’s a PR on my open source project that exhibits this problem: https://github.com/zoomhub/zoomhub/pull/158

@echatav Thanks for documenting how you organize your schema migrations. I ended up doing something similar on my own but it’s nice to see it being validated: https://github.com/zoomhub/zoomhub/tree/69f420ee9f2d6b88392cfa2657948e1c2c74db30/src/ZoomHub/Storage/PostgreSQL/Schema

gasi avatar Feb 28 '21 01:02 gasi