groundhog icon indicating copy to clipboard operation
groundhog copied to clipboard

Get rid of String for internal representations.

Open hansonkd opened this issue 10 years ago • 5 comments

I did a quick first pass here: I made library dependent strings into Text and SQL related strings into Utf8.Text in BSON, Redis and ReQL and the like are closer to Text than to String so it might be nice to have a more direct relationship for the mappings to the backend.

I also got rid of StringLike interface on flattenP since it would've converted into string and back. This also makes it easy to swap out what type Utf8 is, making it easy to try out different string representations.

It still needs some more work. So far I converted groundhog, part of groundhog-th and groundhog-sqlite https://github.com/lykahb/groundhog/compare/master...hansonkd:string-to-text?expand=1

I thought this would have had more of a speed bump, but it looks like it is about the same as current master so I am probably going to put it on hold. Although it still solves the problem of preventing encoding information from being lost (in an intermediate PersistString).

This also removes read for integer conversion in favor of Text.Reader. A parsing library should be found for datetime conversion.

Edit: This should only mean backend changes. Users can keep using Strings and have the same performance as before.

By my current benchmarks it increases the speed of Text fields 2x-4x (compared with Text fields in Master), speeds up query render (see critLite.hs) times between 1.5x and 3x, String fields remain about the same performance. Switching from String to Text will be about a 1.5x speedup.

hansonkd avatar Sep 07 '15 20:09 hansonkd

Actually I'll show my benchmarks:

With this bench:

{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances #-}

import Control.Monad.IO.Class (liftIO)
import Control.Monad
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import Data.Text (Text, pack)
import Data.Text as T
import Data.Maybe (fromJust)
data Person = Person {name :: Text, age :: Int, height :: Int} deriving (Eq, Show)

mkPersist defaultCodegenConfig [groundhog|
- entity: Person
|]

-- First, normal build, then profiling build with "-osuf p_o -hisuf p_hi"
main :: IO ()
main = withSqlitePool ":memory:" 10 $ \gCon -> runDbConn (do { runMigration $ migrate (undefined :: Person); res <- foldM (\acc b -> go >>= return . (+) acc) 0 [1..100]; liftIO $ print res}) gCon
  where go = do
          let person = Person (pack "abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc") 22 180
          k <- insert $ person
          b <- get k
          -- Do some math
          (replicateM_ 1000 $ insert person)
          replicateM_ 1000 $ get k

          -- Force the evaluation of the name field by getting its length
          calc <- (foldM (\acc b -> acc `seq` get k >>= return . ((+) acc . T.length . name . fromJust)) 0 [1..10000])
          calc `seq` return calc

Using master: http://imgur.com/IYFyvRS (~~http://i.imgur.com/SHiT1qO.png~~)

Using Master where name :: String http://i.imgur.com/UAcwlVV.png (~~http://i.imgur.com/c3VDqCg.png~~)

Using Groundhog with No Strings (Over 2x speedup): http://i.imgur.com/4ux62Tc.png (~~http://i.imgur.com/JNTdePg.png~~)

Using No Strings but name :: String (About the same as master) http://i.imgur.com/3q5BbXA.png (~~http://i.imgur.com/hCROyjw.png~~)

So when it comes to text processing I guess it is a bit more than a bump. Seeing as I would suspect most haskellers would use Text for their programs, this would mean an almost 2x speed up.

Edit:

There was a memory bug before, but I added accseq and re ran the benchmarks.

hansonkd avatar Sep 07 '15 20:09 hansonkd

Thank you, it is an interesting experiment. It looks like the speedup comes mostly from adding Text to PersistValue.

The type Utf8 has somewhat unfortunate name. I did not intend to expose it to the general API, it was just a wrapper for the Builder used for SQL rendering. This synonym helped experimenting with several other datatypes before I chose Builder.

Can you create a pull request that replaces PersistString with PersistText? Then we can see how replacing String with Text in entity description affects performance. It is a big change that needs some more thought.

lykahb avatar Sep 09 '15 19:09 lykahb

I agree. It is a big change (It started with PersistValue and then I kinda got carried away). There are essentially 3 parts to this:

  1. ) PerstistValue
  2. ) Internal field representations
  3. ) SQL representation.

Changing PersistString to PersistText is where probably 90% of the performance boost came from. But there is more to moving off strings completely than just performance. Many backends support UTF8 table names and fields (interestingly so do Haskell functions).

I will work on the PR for isolating the PersistString changes.

hansonkd avatar Sep 10 '15 05:09 hansonkd

I am going to issue a new major release soon. If you don't have time for PersistText PR now, I can do the change myself and mention you in the commit.

lykahb avatar Sep 27 '15 16:09 lykahb

Thanks, @hansonkd if you rebase your branch on top of HEAD we can see how internal field representation affect performance. If it is noticeable we can do the change. SQL generation has been optimized with Builder for quite a while. https://github.com/lykahb/groundhog/commit/98671a2bc22a3fc31ee9f826b6c374d98b37aa24

lykahb avatar Nov 17 '15 22:11 lykahb