postgresql-simple icon indicating copy to clipboard operation
postgresql-simple copied to clipboard

space leaks on executeMany

Open s9gf4ult opened this issue 13 years ago • 11 comments

Hello. I am trying to use postgresql-simple to store many same data in the database. When my function looks like this

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
  executeMany c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)" cndls
  return ()

My program leaks. I mean it constructs very many Candle consuming memory and then reduces them and executes query.

When i do this

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
  mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
  return ()

It works smothsly, inserts each candle one by one, but not very fast. I think bacause of many exec's.

Candle looks like this

data Period = FixedSecs !Int64 -- ^ Fixed time period defined by seconds count (i.e. hour is 3600 seconds)
            | Month           -- ^ Month time period
            | Year            -- ^ Year time period
            deriving (Eq, Show)

data Candle = Candle {candleOpenCost :: ! Double  -- ^ Opening cost of candle
                     ,candleCloseCost :: ! Double -- ^ Closing cost of candle
                     ,candleMinCost :: ! Double   -- ^ Minimal cost in the candle
                     ,candleMaxCost :: ! Double   -- ^ Maximal cost in the candle
                     ,candleVolume :: ! Double    -- ^ Volume of all deals in candle
                     ,candleTime :: !UTCTime       -- ^ Start time of the candle
                     ,candlePeriod :: !Period      -- ^ Candle time period
                     }
            deriving (Eq, Show)

And have instance ToRow

instance ToRow Candle where
  toRow (Candle {candleOpenCost = oc,
                 candleCloseCost = cc,
                 candleMinCost = minc,
                 candleMaxCost = maxc,
                 candleVolume = v,
                 candleTime = tm,
                 candlePeriod = per}) = [toField oc,
                                         toField cc,
                                         toField minc,
                                         toField maxc,
                                         toField v,
                                         toField tm] ++ (ptofield per)
    where
      ptofield (FixedSecs s) = [toField ("seconds" :: String),
                                toField s]
      ptofield Month = [toField ("month" :: String),
                        toField (0 :: Int64)]
      ptofield Year = [toField ("year" :: String),
                       toField (0 :: Int64)]

Candle generates just random with this instance

instance Random Candle where
  random g = (Candle {candleOpenCost = oc,
                      candleCloseCost = cc,
                      candleMinCost = mc,
                      candleMaxCost = mac,
                      candleVolume = v,
                      candleTime = t,
                      candlePeriod = p}, ng)
    where
      (oc, g1) = random g
      (cc, g2) = random g1
      (mc, g3) = random g2
      (mac, g4) = random g3
      (v, g5) = random g4
      (t, g6) = random g5
      (p, ng) = random g6
  randomR (Candle {candleOpenCost = oc1,
                   candleCloseCost = cc1,
.....

which is quite lazy. here is all code https://github.com/s9gf4ult/hadan/blob/master/post.hs

s9gf4ult avatar Sep 23 '12 09:09 s9gf4ult

Well, the space leak does need to be investigated and fixed, but you can make your original code run much faster by running it in a single transaction instead of one transaction per insert as you are now.

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = withTransaction c $ do
  mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
  return ()

lpsmith avatar Sep 23 '12 16:09 lpsmith

But i actually do

dbMakeCandles count = do
  con <- connect $ ConnectInfo "127.0.0.1" 5432 "test" "test" "test"
  cndls <- genCandles
  withTransaction con $ postSaveCandles con $ take count $ map (\x -> x `deepseq` x) cndls

I will try to write more tests to narrow this problem.

s9gf4ult avatar Sep 23 '12 17:09 s9gf4ult

Ok, what do you mean by a "space leak"? Have you tried running the heap profiler on your program?

Unfortunately, Haskell's memory consumption is not, in general, a local property. How much space a function uses is not only a question of how the function is written, but also how the result is consumed. Just be aware of that issue when trying to isolate this issue.

lpsmith avatar Sep 24 '12 16:09 lpsmith

Have you tried running the heap profiler on your program

Yes. I am not sure i understand what i got in the profiling, i am not very familiar with heap profiling in Haskell, but there is truangular peak while program runs. It generates huge amount of some data, then folding it down.

Just be aware of that issue when trying to isolate this issue.

Ok

s9gf4ult avatar Sep 25 '12 04:09 s9gf4ult

Out of curiosity, have you tried compiling postgresql-simple with heap profiling enabled?

This may not actually be a space leak, though it's entirely possible that executeMany is temporarily using significantly more memory than it should. Though if it uses more than O(n) memory, I would classify that as a sort of space leak.

What I would suggest is writing a test program that exhibits the issue. I see that you are using UTCTime, so the problem may be in my re-worked time printers as well. In your test program, I would try to use the same mix of field types you use in this code. And don't worry too much about trying to locate the problem, I'm happy to help with that.

lpsmith avatar Sep 26 '12 17:09 lpsmith

I have written some tests

https://github.com/s9gf4ult/teststorage

here two executables GenerateTest and PostTest. First generate given amount of objects and dies, second does "executeMany" on given amount of Storagles and then select is back. Both executables take one parameter with amount of storables to operate on, default is 10000 if parameter is skipped. To work PostTest you need "test" as user, password and database name on localhost, or change it manually on line 14 of PostTest.hs

Memory consumed 5 times more on inserts then on selects. I think this is because of Haskell does not release intermediate data memory until completely execute database query. This is posible not a space leak but some king of ... what ?

s9gf4ult avatar Oct 01 '12 07:10 s9gf4ult

I accidentally closed this issue ..

s9gf4ult avatar Oct 01 '12 07:10 s9gf4ult

By the way, did you ever get this resolved? The link to your code is no longer available...

lpsmith avatar May 08 '13 09:05 lpsmith

Oh, I was looking at the link in your original post; thanks for the test cases!

lpsmith avatar May 08 '13 09:05 lpsmith

Ok, I did take a brief look at your test case; the one thing that stands out to me is that your insert appears to take quadratic time: doubling the number of elements in the test case approximately quadruples the amount time it takes to make the insert.

And I did verify that the problem is inside postgresql-simple itself. It only appears to be a problem when strings need to be escaped via libpq... my first attempt to replicate this outside of your test case was not successful because I only used integers which do not get escaped via libpq.

In any case, here's a stripped down test case that exhibits the behavior:

{-# LANGUAGE OverloadedStrings, BangPatterns #-}

import Database.PostgreSQL.Simple
import System.Environment

main = do
   (nstr:_) <- getArgs
   let n = read nstr :: Int
   c <- connectPostgreSQL ""
   !q <- formatMany c "insert into foo values (?,?)" [ (i,show i) | i <- [1..n] ]
   close c

I then compiled this file and then timed it using the unix time command. I don't know if this is related to the memory consumption problems you are experiencing, but I figure there is a good chance it is.

lpsmith avatar Jul 08 '13 14:07 lpsmith

Ok, I fixed postgresql-libpq's binding to PQescapeStringConn in bug #70, and it appears to be pretty close to linear time now. Could you upgrade to postgresql-libpq-0.8.2.3 and see if this fixed your space problem to your satisfaction?

lpsmith avatar Jul 09 '13 00:07 lpsmith