space leaks on executeMany
Hello. I am trying to use postgresql-simple to store many same data in the database. When my function looks like this
postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
executeMany c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)" cndls
return ()
My program leaks. I mean it constructs very many Candle consuming memory and then reduces them and executes query.
When i do this
postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
return ()
It works smothsly, inserts each candle one by one, but not very fast. I think bacause of many exec's.
Candle looks like this
data Period = FixedSecs !Int64 -- ^ Fixed time period defined by seconds count (i.e. hour is 3600 seconds)
| Month -- ^ Month time period
| Year -- ^ Year time period
deriving (Eq, Show)
data Candle = Candle {candleOpenCost :: ! Double -- ^ Opening cost of candle
,candleCloseCost :: ! Double -- ^ Closing cost of candle
,candleMinCost :: ! Double -- ^ Minimal cost in the candle
,candleMaxCost :: ! Double -- ^ Maximal cost in the candle
,candleVolume :: ! Double -- ^ Volume of all deals in candle
,candleTime :: !UTCTime -- ^ Start time of the candle
,candlePeriod :: !Period -- ^ Candle time period
}
deriving (Eq, Show)
And have instance ToRow
instance ToRow Candle where
toRow (Candle {candleOpenCost = oc,
candleCloseCost = cc,
candleMinCost = minc,
candleMaxCost = maxc,
candleVolume = v,
candleTime = tm,
candlePeriod = per}) = [toField oc,
toField cc,
toField minc,
toField maxc,
toField v,
toField tm] ++ (ptofield per)
where
ptofield (FixedSecs s) = [toField ("seconds" :: String),
toField s]
ptofield Month = [toField ("month" :: String),
toField (0 :: Int64)]
ptofield Year = [toField ("year" :: String),
toField (0 :: Int64)]
Candle generates just random with this instance
instance Random Candle where
random g = (Candle {candleOpenCost = oc,
candleCloseCost = cc,
candleMinCost = mc,
candleMaxCost = mac,
candleVolume = v,
candleTime = t,
candlePeriod = p}, ng)
where
(oc, g1) = random g
(cc, g2) = random g1
(mc, g3) = random g2
(mac, g4) = random g3
(v, g5) = random g4
(t, g6) = random g5
(p, ng) = random g6
randomR (Candle {candleOpenCost = oc1,
candleCloseCost = cc1,
.....
which is quite lazy. here is all code https://github.com/s9gf4ult/hadan/blob/master/post.hs
Well, the space leak does need to be investigated and fixed, but you can make your original code run much faster by running it in a single transaction instead of one transaction per insert as you are now.
postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = withTransaction c $ do
mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
return ()
But i actually do
dbMakeCandles count = do
con <- connect $ ConnectInfo "127.0.0.1" 5432 "test" "test" "test"
cndls <- genCandles
withTransaction con $ postSaveCandles con $ take count $ map (\x -> x `deepseq` x) cndls
I will try to write more tests to narrow this problem.
Ok, what do you mean by a "space leak"? Have you tried running the heap profiler on your program?
Unfortunately, Haskell's memory consumption is not, in general, a local property. How much space a function uses is not only a question of how the function is written, but also how the result is consumed. Just be aware of that issue when trying to isolate this issue.
Have you tried running the heap profiler on your program
Yes. I am not sure i understand what i got in the profiling, i am not very familiar with heap profiling in Haskell, but there is truangular peak while program runs. It generates huge amount of some data, then folding it down.
Just be aware of that issue when trying to isolate this issue.
Ok
Out of curiosity, have you tried compiling postgresql-simple with heap profiling enabled?
This may not actually be a space leak, though it's entirely possible that executeMany is temporarily using significantly more memory than it should. Though if it uses more than O(n) memory, I would classify that as a sort of space leak.
What I would suggest is writing a test program that exhibits the issue. I see that you are using UTCTime, so the problem may be in my re-worked time printers as well. In your test program, I would try to use the same mix of field types you use in this code. And don't worry too much about trying to locate the problem, I'm happy to help with that.
I have written some tests
https://github.com/s9gf4ult/teststorage
here two executables GenerateTest and PostTest. First generate given amount of objects and dies, second does "executeMany" on given amount of Storagles and then select is back. Both executables take one parameter with amount of storables to operate on, default is 10000 if parameter is skipped. To work PostTest you need "test" as user, password and database name on localhost, or change it manually on line 14 of PostTest.hs
Memory consumed 5 times more on inserts then on selects. I think this is because of Haskell does not release intermediate data memory until completely execute database query. This is posible not a space leak but some king of ... what ?
I accidentally closed this issue ..
By the way, did you ever get this resolved? The link to your code is no longer available...
Oh, I was looking at the link in your original post; thanks for the test cases!
Ok, I did take a brief look at your test case; the one thing that stands out to me is that your insert appears to take quadratic time: doubling the number of elements in the test case approximately quadruples the amount time it takes to make the insert.
And I did verify that the problem is inside postgresql-simple itself. It only appears to be a problem when strings need to be escaped via libpq... my first attempt to replicate this outside of your test case was not successful because I only used integers which do not get escaped via libpq.
In any case, here's a stripped down test case that exhibits the behavior:
{-# LANGUAGE OverloadedStrings, BangPatterns #-}
import Database.PostgreSQL.Simple
import System.Environment
main = do
(nstr:_) <- getArgs
let n = read nstr :: Int
c <- connectPostgreSQL ""
!q <- formatMany c "insert into foo values (?,?)" [ (i,show i) | i <- [1..n] ]
close c
I then compiled this file and then timed it using the unix time command. I don't know if this is related to the memory consumption problems you are experiencing, but I figure there is a good chance it is.
Ok, I fixed postgresql-libpq's binding to PQescapeStringConn in bug #70, and it appears to be pretty close to linear time now. Could you upgrade to postgresql-libpq-0.8.2.3 and see if this fixed your space problem to your satisfaction?