hledger icon indicating copy to clipboard operation
hledger copied to clipboard

1.29: performance loss since 1.29, especially with many accounts

Open markokocic opened this issue 1 year ago • 62 comments

Lately I noticed that hledger was not as fast as it was. It is an issue because some scripts that I have that execute hledger multiple times are becoming very slow.

I decided to benchmark it against ledger.

My system:

marko@fine ~> hledger stats
Main file                : /home/marko/src/ledgers/transactions.ledger
Included files           :
Transactions span        : 2011-11-29 to 2024-01-16 (4431 days)
Last transaction         : 2024-01-15 (3 days from now)
Transactions             : 17385 (3.9 per day)
Transactions last 30 days: 156 (5.2 per day)
Transactions last 7 days : 37 (5.3 per day)
Payees/descriptions      : 3382
Accounts                 : 470 (depth 6)
Commodities              : 4 ($, inr, rsd, €)
Market prices            : 0 ()

Run time (throughput)    : 3.45s (5035 txns/s)

Balance command

marko@fine ~> time hledger bal > /dev/null

________________________________________________________
Executed in    4.07 secs    fish           external
   usr time    3.91 secs  878.00 micros    3.91 secs
   sys time    0.18 secs  185.00 micros    0.18 secs

Then I converted my journal file in order to use it with ledger:

marko@fine ~> hledger print --round=soft > test.journal

And tested it again with ledger

marko@fine ~> time ledger bal -f test.journal > /dev/null

________________________________________________________
Executed in  381.00 millis    fish           external
   usr time  344.67 millis    1.35 millis  343.31 millis
   sys time   36.13 millis    0.20 millis   35.93 millis

The result is that for balance operation, hledger takes 10 times more time that ledger.

My system is a small Linux VM with 2 CPU and 4 GB RAM. Hledger is the latest one built from source.

Other commands are also slower. Notice above that hledger print took 3.45 seconds. In ledger it's less than 1 second.

marko@fine ~> time ledger print -f test.journal > /dev/null

________________________________________________________
Executed in    1.13 secs      fish           external
   usr time  984.76 millis    1.29 millis  983.47 millis
   sys time  144.16 millis    0.23 millis  143.92 millis

markokocic avatar Jan 12 '24 10:01 markokocic

Thanks for this useful data @markokocic. See https://hledger.org/ledger.html#performance - Ledger was always up to 10x faster with "big data" in certain reports. Usually I compare our performance relative to past versions as well as Ledger. Here's what I get just now on a macbook air m1:

~/src/hledger$ just --set BENCHEXES ledger,hledger-1.29,hledger-1.30,hledger-1.31,hledger-1.32 bench 
Running benchmarks with ledger,hledger-1.29,hledger-1.30,hledger-1.31,hledger-1.32 (times are approximate, can be skewed):
Running 4 tests 1 times with 5 executables at 2024-01-12 07:08:16 HST:

Best times:
+----------------------------------------------------++--------+--------------+--------------+--------------+--------------+
|                                                    || ledger | hledger-1.29 | hledger-1.30 | hledger-1.31 | hledger-1.32 |
+====================================================++========+==============+==============+==============+==============+
| -f examples/10000x1000x10.journal print            ||   5.48 |         0.80 |         0.85 |         0.86 |         0.89 |
| -f examples/10000x1000x10.journal register         ||  16.88 |         5.04 |         5.43 |         6.25 |         4.89 |
| -f examples/10000x1000x10.journal balance          ||   0.42 |         0.83 |         0.85 |         0.86 |         0.80 |
| -f examples/10000x1000x10.journal balance --weekly ||   0.40 |        11.19 |         9.42 |        10.19 |         9.94 |
+----------------------------------------------------++--------+--------------+--------------+--------------+--------------+

These benchmarks are using the 10k-transaction sample file, about half the size of yours:

~/src/hledger$ hledger-1.32 -f examples/10000x1000x10.journal stats
Main file                : /Users/simon/src/hledger/examples/10000x1000x10.journal
Included files           : 
Transactions span        : 2000-01-01 to 2027-05-19 (10000 days)
Last transaction         : 2027-05-18 (1222 days from now)
Transactions             : 10000 (1.0 per day)
Transactions last 30 days: 31 (1.0 per day)
Transactions last 7 days : 8 (1.1 per day)
Payees/descriptions      : 10000
Accounts                 : 1000 (depth 10)
Commodities              : 26 (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z)
Market prices            : 10000 (A)

Run time (throughput)    : 0.76s (13157 txns/s)

hledger vs hledger:

  • This is a faster machine than yours but we'd expect the performance change over time to be similar.
  • I'm not seeing a slowdown in recent hledger.
  • Two years ago with 1.24.99 (see doc) I was achieving 23k txns/s on this machine, and I haven't seen that recently. Possibly a slowdown since then, or some other explanation..

hledger vs Ledger:

  • This brew-installed Ledger binary might be wrongly built in debug mode and not showing its true speed; I seem to remember that being an issue. Though, we have the impression the Ledger formula is built with proper release options..

simonmichael avatar Jan 12 '24 17:01 simonmichael

What are the details of your ledger and hledger installations - optimised system packages ? Which distro ?

Could you check memory usage ? Perhaps hledger is using all of yours and swapping ? Here are three ways:

  • use GNU time (/usr/bin/time maybe)
  • add +RTS -s -RTS to hledger commands
  • watch while it runs in a fast-updating top

simonmichael avatar Jan 12 '24 17:01 simonmichael

hi @simonmichael

I wasn't under the impression that my ledger file is considered big :)

I am using Debian unstable. Ledger is the one that comes with it. Hledger is built from git using stack install

marko@fine:~$ time ledger b -f test.journal > /dev/null

real    0m0.426s
user    0m0.366s
sys     0m0.061s
marko@fine:~$ time hledger bal -f test.journal > /dev/null

real    0m3.610s
user    0m3.443s
sys     0m0.185s
marko@fine:~$ ledger --version
Ledger 3.3.2-20230330, the command-line accounting tool
without support for gpg encrypted journals and with Python support

Copyright (c) 2003-2023, John Wiegley.  All rights reserved.

This program is made available under the terms of the BSD Public License.
See LICENSE file included with the distribution for details and disclaimer.
marko@fine:~$ hledger --version
hledger 1.32.99-g76209d576-20240109, linux-x86_64
marko@fine:~$
marko@fine:~$ time hledger +RTS -s -RTS bal -f test.journal > /dev/null
   3,737,626,528 bytes allocated in the heap
     357,050,840 bytes copied during GC
      60,780,704 bytes maximum residency (9 sample(s))
       1,845,336 bytes maximum slop
             157 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       897 colls,     0 par    0.471s   0.473s     0.0005s    0.0090s
  Gen  1         9 colls,     0 par    0.510s   0.511s     0.0567s    0.1683s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.002s  (  0.002s elapsed)
  MUT     time    2.238s  (  2.219s elapsed)
  GC      time    0.982s  (  0.984s elapsed)
  EXIT    time    0.001s  (  0.005s elapsed)
  Total   time    3.223s  (  3.210s elapsed)

  Alloc rate    1,669,829,075 bytes per MUT second

  Productivity  69.4% of total user, 69.1% of total elapsed


real    0m3.229s
user    0m3.049s
sys     0m0.193s
marko@fine:~$

I then ran hledger again multiple times and watched memory usage in top, and its virtual memory was around ~1gb, and resident maxed around 150-180 M. There was enough RAM available in VM and swap was not used.

markokocic avatar Jan 15 '24 11:01 markokocic

Yours is not "big data" :), but it's bigger than the files most users typically work with (I am guessing most of us split by year to avoid such slowness). And it's big enough where traditionally it's expected that Ledger will be faster.

I haven't yet been able to reproduce this on mac though. I'll try it on a GNU/linux machine.

Your initial report said that hledger itself is getting slower; this would be more concerning. You can check that by benchmarking with some BENCHEXES like I did above.

simonmichael avatar Jan 15 '24 16:01 simonmichael

Any more info on this @markokocic ? Can you show a slowdown in recent hledger versions ?

simonmichael avatar Jan 22 '24 19:01 simonmichael

hi @simonmichael , sorry for the delay, compiling hledger from source on my system takes a bit of time :)

Here I compiled from source using stack install hledger current git head and branches 1.28 and 1.30 and did a quick comparison using my journal. I run every command multiple times and here are representative samples.

Here are some timings (with warm filesystem cache):

Print command:

marko@fine:~$ time hledger-1.28 print > /dev/null

real    0m2.326s
user    0m2.169s
sys     0m0.158s
marko@fine:~$ time hledger-1.30 print > /dev/null

real    0m3.506s
user    0m3.331s
sys     0m0.194s
marko@fine:~$ time hledger-upstream print > /dev/null

real    0m3.845s
user    0m3.632s
sys     0m0.222s

Balance command:

marko@fine:~$ time hledger-1.28 bal > /dev/null

real    0m2.101s
user    0m1.953s
sys     0m0.157s
marko@fine:~$ time hledger-1.30 bal > /dev/null

real    0m3.061s
user    0m2.906s
sys     0m0.169s
marko@fine:~$ time hledger-upstream bal > /dev/null

real    0m3.150s
user    0m2.928s
sys     0m0.237s

As you can see, hledger 1.30 is significantly slower than 1.28, and current git again a bit slower than that.

If you want me to try a specific build, let me know, and I'll compile and run it.

markokocic avatar Jan 23 '24 15:01 markokocic

Thank you! When you have time, would you mind repeating those timings with examples/10000x10000x10.journal ? You can make that with just samplejournals or make samplejournals.

simonmichael avatar Jan 23 '24 16:01 simonmichael

Balance:

marko@fine:~/src/cvstree/hledger$ time hledger-1.28 bal -f examples/10000x10000x10.journal > /dev/null

real    0m3.788s
user    0m3.664s
sys     0m0.137s
marko@fine:~/src/cvstree/hledger$ time hledger-1.28 bal -f examples/10000x10000x10.journal > /dev/null

real    0m2.378s
user    0m2.251s
sys     0m0.133s
marko@fine:~/src/cvstree/hledger$ time hledger-1.30 bal -f examples/10000x10000x10.journal > /dev/null

real    0m10.405s
user    0m10.304s
sys     0m0.153s
marko@fine:~/src/cvstree/hledger$ time hledger-1.30 bal -f examples/10000x10000x10.journal > /dev/null

real    0m10.616s
user    0m10.517s
sys     0m0.153s
marko@fine:~/src/cvstree/hledger$ time hledger-upstream bal -f examples/10000x10000x10.journal > /dev/null

real    0m11.772s
user    0m11.631s
sys     0m0.202s
marko@fine:~/src/cvstree/hledger$ time hledger-upstream bal -f examples/10000x10000x10.journal > /dev/null

real    0m10.661s
user    0m10.518s
sys     0m0.194s

Print:

marko@fine:~/src/cvstree/hledger$ time hledger-1.28 print -f examples/10000x10000x10.journal > /dev/null

real    0m1.890s
user    0m1.786s
sys     0m0.105s
marko@fine:~/src/cvstree/hledger$ time hledger-1.28 print -f examples/10000x10000x10.journal > /dev/null

real    0m1.604s
user    0m1.523s
sys     0m0.081s
marko@fine:~/src/cvstree/hledger$ time hledger-1.30 print -f examples/10000x10000x10.journal > /dev/null

real    0m9.865s
user    0m9.719s
sys     0m0.189s
marko@fine:~/src/cvstree/hledger$ time hledger-1.30 print -f examples/10000x10000x10.journal > /dev/null

real    0m9.911s
user    0m9.809s
sys     0m0.146s
marko@fine:~/src/cvstree/hledger$ time hledger-upstream print -f examples/10000x10000x10.journal > /dev/null

real    0m11.398s
user    0m11.236s
sys     0m0.217s
marko@fine:~/src/cvstree/hledger$ time hledger-upstream print -f examples/10000x10000x10.journal > /dev/null

real    0m9.836s
user    0m9.739s
sys     0m0.141s

Here is the difference even more drastical than in my journal file. The drop in 1.30 is even more visible. I'm building 1.29 right now, will post the results later.

markokocic avatar Jan 23 '24 16:01 markokocic

Ok, just built 1.29 and repeated the same test:

marko@fine:~/src/cvstree/hledger$ time hledger-1.29 print -f examples/10000x10000x10.journal > /dev/null

real    0m10.524s
user    0m10.429s
sys     0m0.141s
marko@fine:~/src/cvstree/hledger$ time hledger-1.29 print -f examples/10000x10000x10.journal > /dev/null

real    0m9.934s
user    0m9.577s
sys     0m0.401s
marko@fine:~/src/cvstree/hledger$ time hledger-1.29 print -f examples/10000x10000x10.journal > /dev/null

real    0m10.761s
user    0m10.634s
sys     0m0.170s
marko@fine:~/src/cvstree/hledger$ time hledger-1.29 bal -f examples/10000x10000x10.journal > /dev/null

real    0m12.260s
user    0m12.147s
sys     0m0.166s
marko@fine:~/src/cvstree/hledger$ time hledger-1.29 bal -f examples/10000x10000x10.journal > /dev/null

real    0m10.653s
user    0m10.045s
sys     0m0.661s

As the results show, the main slowdown started with 1.29. After that, it became a bit slower, but not so significantly.

markokocic avatar Jan 23 '24 18:01 markokocic

Thanks! Very strange. I'll try to reproduce it on other platforms.

simonmichael avatar Jan 23 '24 18:01 simonmichael

Benchmarking is hard. I cannot reproduce those dramatic slowdowns; I feel there must be a problem/difference with those binaries - built unoptimised ? built with different GHC version ? - or the slower runs were swapping or disturbed by other processes ?

Here are the most representative numbers of my tests on GNU/linux. (I tested with 5000 transactions since this VPS won't run the 10k tests.) I see a mild slowdown in print and balance in 1.28 and another in 1.29 :

root@localhost:/opt/hledger# just bench -f bench5k.sh -w hledger-1.27,hledger-1.28,hledger-1.29,hledger-1.30,hledger-1.31,hledger-1.32 -N3 -n2                                                                                                                                          
Running quick benchmarks (times are approximate, can be skewed):                                                                                                                                                                                                                        
Running 3 tests 2 times with 6 executables at 2024-01-24 23:34:49 HST:                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                        
Best times 1:                                                                                                                                                                                                                                                                           
+-------------------------------------------++--------------+--------------+--------------+--------------+--------------+--------------+                                                                                                                                                
|                                           || hledger-1.27 | hledger-1.28 | hledger-1.29 | hledger-1.30 | hledger-1.31 | hledger-1.32 |                                                                                                                                                
+===========================================++==============+==============+==============+==============+==============+==============+                                                                                                                                                
| -f examples/5000x1000x10.journal print    ||         0.59 |         0.68 |         0.87 |         0.87 |         0.89 |         0.88 |                                                                                                                                                
| -f examples/5000x1000x10.journal register ||         3.53 |         3.58 |         3.37 |         3.65 |         3.51 |         3.97 |                                                                                                                                                
| -f examples/5000x1000x10.journal balance  ||         0.56 |         0.66 |         0.84 |         0.84 |         0.84 |         0.87 |                                                                                                                                                
+-------------------------------------------++--------------+--------------+--------------+--------------+--------------+--------------+                                                                                                                                                

simonmichael avatar Jan 25 '24 09:01 simonmichael

For interest, here are my best representative results on this mac. I don't have hledger 1.27 and 1.28 here because there's something wrong with those executables, they are abnormally slow. I have included homebrew's Ledger binary, and another built from source. These show

  • mild slowdowns in all three reports in hledger 1.30 and 1.31 compared to 1.29, with speed mostly restored in 1.32
  • ledger balance 2-3x faster than hledger, ledger print 6x slower and ledger register 3x slower with 10k transactions.
~/src/hledger$ just bench -f bench10k.sh -w hledger-1.29,hledger-1.30,hledger-1.31,hledger-1.32,ledger.bin,ledger.src                     
Running quick benchmarks (times are approximate, can be skewed):                                                                          
Running 3 tests 1 times with 6 executables at 2024-01-24 23:48:59 HST:                                                                    
                                                                                                                                          
Best times:                                                                                                                               
+--------------------------------------------++--------------+--------------+--------------+--------------+------------+------------+     
|                                            || hledger-1.29 | hledger-1.30 | hledger-1.31 | hledger-1.32 | ledger.bin | ledger.src |     
+============================================++==============+==============+==============+==============+============+============+     
| -f examples/10000x1000x10.journal print    ||         0.79 |         1.17 |         1.30 |         1.01 |       6.90 |       6.07 |     
| -f examples/10000x1000x10.journal register ||         4.14 |         5.96 |         5.03 |         5.04 |      16.07 |      18.80 |     
| -f examples/10000x1000x10.journal balance  ||         0.81 |         1.23 |         1.23 |         0.88 |       0.37 |       0.36 |     
+--------------------------------------------++--------------+--------------+--------------+--------------+------------+------------+     

simonmichael avatar Jan 25 '24 10:01 simonmichael

And here are the github release binaries since 1.27, on mac. These are intel binaries so run a bit slower than the arm binaries above. They seem to show a 20-30% slowdown in print and balance since 1.29, with 10k transactions.

+--------------------------------------------++------------------+------------------+------------------+------------------+------------------+------------------+
|                                            || hledger-1.27-x64 | hledger-1.28-x64 | hledger-1.29-x64 | hledger-1.30-x64 | hledger-1.31-x64 | hledger-1.32-x64 |
+============================================++==================+==================+==================+==================+==================+==================+
| -f examples/10000x1000x10.journal print    ||             0.88 |             0.88 |             1.23 |             1.18 |             1.21 |             1.24 |
| -f examples/10000x1000x10.journal register ||             5.74 |             6.09 |             4.97 |             5.36 |             5.38 |             5.16 |
| -f examples/10000x1000x10.journal balance  ||             0.93 |             0.91 |             1.15 |             1.18 |             1.18 |             1.21 |
+--------------------------------------------++------------------+------------------+------------------+------------------+------------------+------------------+

simonmichael avatar Jan 25 '24 10:01 simonmichael

All my tests are done on a Debian sid Linux VPS with 2 CPU and 4 GB RAM. I built everything from the source code using stack install hledger from an appropriate branch, using whatever ghc was pulled by the build system. While running I was checking memory, and there was no swapping in any of the cases, neither with my journal, nor with 10000x10000x10 test journal.

Just compiled also 1.27, and run the tests again. print and bal commands both show similar speed in 1.27 and 1.28, a huge slowdown in 1.29 and no big difference in 1.30 and git tip.

When I look at your results in #issuecomment-1909772545 I can see there is also a big ~40-50% increase in runtime between 1.28 and 1.29. Maybe not so drastic like in my case because of the smaller test.

Even your mac tests show the same 50% runtime increase for print and balance in 1.29 compared to the previous version.

The question is, why performance dropped in 1.29? Could be ghc version that caused the slowdown, or some of the library dependencies, or some of the code. It would be interesting to do a git bisect between 1.28 and 1.29 to find a commit that is responsible for the most of the performance degradation.

Why is the difference bigger in my case? Probably because my VPS has only 2 CPU cores while your Mac has more, and can parallelise better? Or because it's faster overall. Just a wild guess, not a haskell developer.

markokocic avatar Jan 25 '24 10:01 markokocic

I notice that env time (/usr/bin/time) shows memory and swap usage, which is handy.

You're right that my numbers tend to show a slowdown that's actually more like 30-50% (I miscounted it above), and somewhere around 1.29. If we squint. But certainly not a 4-5x slowdown like you're showing. I can't explain that difference right now other than the ideas already mentioned... I don't think it's mac-related since I also tested on linux/x64. And I don't think hledger can use more than one core.

Why the performance drop around 1.29, well that's where we must turn to profiling (or bisecting but that might be rather slow). (I see no obvious causes of slowness in the release notes; hopefully we'll find it's just an accidental pessimisation..)

simonmichael avatar Jan 25 '24 11:01 simonmichael

Another idea - it could possibly be a GHC bug on your platform ?

I would check the output of hledger +RTS --info for each of your executables. That will show the GHC version they were built with (and I'd compare the other stuff too just in case). Then we could check GHC known issues, and/or you could try building the slow executables with some different GHC versions.

simonmichael avatar Jan 25 '24 11:01 simonmichael

Hi @simonmichael , here's the output:

perf issues across platforms are hard to debug, but something tells me if you were able to cut off these 30-50% on your system, it may automagically solve 4-5x slowdown on my system.

hlegder info

marko@fine ~> hledger-1.27 +RTS --info
 [("GHC RTS", "YES")
 ,("GHC version", "9.2.4")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "unknown")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "unknown")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "unknown")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ,("Flag -with-rtsopts", "")
 ]
marko@fine ~> hledger-1.28 +RTS --info
 [("GHC RTS", "YES")
 ,("GHC version", "9.2.5")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "unknown")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "unknown")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "unknown")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ,("Flag -with-rtsopts", "")
 ]
marko@fine ~> hledger-1.29 +RTS --info
 [("GHC RTS", "YES")
 ,("GHC version", "9.4.4")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "unknown")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "unknown")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "unknown")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ,("Flag -with-rtsopts", "")
 ]
marko@fine ~> hledger-1.30 +RTS --info
 [("GHC RTS", "YES")
 ,("GHC version", "9.4.5")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "unknown")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "unknown")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "unknown")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ,("Flag -with-rtsopts", "")
 ]
marko@fine ~> hledger-upstream +RTS --info
 [("GHC RTS", "YES")
 ,("GHC version", "9.6.3")
 ,("RTS way", "rts_thr")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Build architecture", "x86_64")
 ,("Build OS", "linux")
 ,("Build vendor", "unknown")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Host architecture", "x86_64")
 ,("Host OS", "linux")
 ,("Host vendor", "unknown")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Target architecture", "x86_64")
 ,("Target OS", "linux")
 ,("Target vendor", "unknown")
 ,("Word size", "64")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ,("Flag -with-rtsopts", "")
 ]

So yes, the switch to ghc 9.4 made it slower, but not sure if it's the cause, since newer versions didn't make it faster again.

Timings for 1.28, 1.29 and upstream

marko@fine ~/s/c/hledger> /usr/bin/time -v hledger-1.28 bal -f examples/10000x10000x10.journal > /dev/null
        Command being timed: "hledger-1.28 bal -f examples/10000x10000x10.journal"
        User time (seconds): 2.91
        System time (seconds): 0.13
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.03
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 153932
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 32391
        Voluntary context switches: 323
        Involuntary context switches: 21
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
marko@fine ~/s/c/hledger> /usr/bin/time -v hledger-1.29 bal -f examples/10000x10000x10.journal > /dev/null
        Command being timed: "hledger-1.29 bal -f examples/10000x10000x10.journal"
        User time (seconds): 15.07
        System time (seconds): 0.20
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.19
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 150740
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 31272
        Voluntary context switches: 1534
        Involuntary context switches: 146
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
marko@fine ~/s/c/hledger> /usr/bin/time -v hledger-upstream bal -f examples/10000x10000x10.journal > /dev/null
        Command being timed: "hledger-upstream bal -f examples/10000x10000x10.journal"
        User time (seconds): 16.43
        System time (seconds): 0.16
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.50
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 151868
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 18
        Minor (reclaiming a frame) page faults: 32409
        Voluntary context switches: 1675
        Involuntary context switches: 104
        Swaps: 0
        File system inputs: 1824
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

markokocic avatar Jan 25 '24 12:01 markokocic

Good info, thanks. You could try building 1.29 with GHC 9.2 like so:

git switch 1.29-branch
stack install hledger --resolver lts-20.26

simonmichael avatar Jan 25 '24 12:01 simonmichael

No difference between builds with different ghc:

marko@fine ~/s/c/hledger (1.29-branch)> /usr/bin/time -v hledger-1.29 bal -f examples/10000x10000x10.journal > /dev/null
        Command being timed: "hledger-1.29 bal -f examples/10000x10000x10.journal"
        User time (seconds): 16.41
        System time (seconds): 0.25
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.58
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 152776
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 31272
        Voluntary context switches: 1678
        Involuntary context switches: 165
        Swaps: 0
        File system inputs: 8
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
marko@fine ~/s/c/hledger (1.29-branch)> /usr/bin/time -v hledger-1.29_lts2026 bal -f examples/10000x10000x10.journal > /dev/null
        Command being timed: "hledger-1.29_lts2026 bal -f examples/10000x10000x10.journal"
        User time (seconds): 16.45
        System time (seconds): 0.16
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.52
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 135940
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 28070
        Voluntary context switches: 1669
        Involuntary context switches: 87
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

markokocic avatar Jan 25 '24 13:01 markokocic

Here's my report.


I have a large log file that includes timetracking info.

$ wc Log.txt
 16152  64929 481224 Log.txt

hledger 1.30.1, built with ghc-9.4.8, appears to be 10x slower to run a balance calculation than hledger 1.27.1 built with ghc-9.2.8.

$ cat run-bal.sh
hledger -f timedot:$(git root)/Log.txt bal --tree -b 'this week' -DT --sort

$ nix shell github:nixos/nixpkgs/nixos-23.05#hledger -c bench ./run-bal.sh
benchmarking ./run-bal.sh
time                 230.3 ms   (228.3 ms .. 232.6 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 233.7 ms   (231.5 ms .. 238.8 ms)
std dev              4.135 ms   (676.1 μs .. 5.864 ms)
variance introduced by outliers: 14% (moderately inflated)


$ nix shell github:nixos/nixpkgs/nixos-23.11#hledger -c bench ./run-bal.sh
benchmarking ./run-bal.sh
time                 3.325 s    (3.251 s .. 3.382 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.290 s    (3.269 s .. 3.307 s)
std dev              20.45 ms   (13.66 ms .. 24.96 ms)
variance introduced by outliers: 19% (moderately inflated)

Version info:

$ nix shell github:nixos/nixpkgs/nixos-23.05#hledger -c hledger --version
hledger 1.27.1, linux-x86_64

$ nix run github:nixos/nixpkgs/nixos-23.05#ghc --  --version
The Glorious Glasgow Haskell Compilation System, version 9.2.8

$ nix shell github:nixos/nixpkgs/nixos-23.11#hledger -c hledger --version
hledger 1.30.1, linux-x86_64

$ nix run github:nixos/nixpkgs/nixos-23.11#ghc --  --version
The Glorious Glasgow Haskell Compilation System, version 9.4.8

chreekat avatar Jan 25 '24 15:01 chreekat

I did a bisect using my test case and 76f8eaf91ab17964114489558f5760246432702b seems to be the commit that introduced the slow-down.

chreekat avatar Jan 25 '24 18:01 chreekat

Thank you !

Whereas I, of course 😅, see no slowdown at that commit (ghc 9.4.8 on mac arm):

~/src/hledger$ just bench -w hledger-preslow,hledger-slow
Running quick benchmarks (times are approximate, can be skewed):
Running 3 tests 1 times with 2 executables at 2024-01-25 09:16:14 HST:

Best times:
+--------------------------------------------++-----------------+--------------+
|                                            || hledger-preslow | hledger-slow |
+============================================++=================+==============+
| -f examples/10000x1000x10.journal print    ||            0.77 |         0.79 |
| -f examples/10000x1000x10.journal register ||            4.94 |         4.16 |
| -f examples/10000x1000x10.journal balance  ||            0.71 |         0.75 |
+--------------------------------------------++-----------------+--------------+
~/src/hledger$ hledger-preslow --version
hledger 1.28.99-g7432769d3-20230125, mac-aarch64
~/src/hledger$ hledger-slow --version
hledger 1.28.99-g76f8eaf91-20230125, mac-aarch64
~/src/hledger$ hledger-preslow +RTS --info | grep version; hledger-slow +RTS --info | grep version
 ,("GHC version", "9.4.8")
 ,("GHC version", "9.4.8")
~/src/hledger$ uname; arch
Darwin
arm64

simonmichael avatar Jan 25 '24 19:01 simonmichael

Doh! :D

Could it have to do with my input? Unfortunately it would take a lot of work to anonymize it or generate similar data...

chreekat avatar Jan 25 '24 19:01 chreekat

I also see no slowdown at that commit when running /path/to/hledger -f hledger/bench/10000x1000x10.journal bal.

The script I was running is

#!/usr/bin/env bash
if ! stack install hledger; then
    exit 125 # Skip
fi

input=${1:-timedot:Log.txt}
time=${2:-1}

timeout $time  ~/.local/bin/hledger -f $input bal --tree -b 'this week' -DT --sort >/dev/null
res=$?
git checkout -- .
exit $res

If I try to run that same command line against the 10000x1000x10 file, I get:

$ time ./bisect-tool.sh hledger/bench/10000x1000x10.journal  10000
Copying from /home/b/src/haskell/hledger/.stack-work/install/x86_64-linux-nix/fe33b48bc8c8eccb9ebf33418ec92466a0c5b999f9b6ebe098111c0b09206ea9/9.4.4/bin/hledger to
/home/b/.local/bin/hledger.
Copied executables to /home/b/.local/bin/:
* hledger

real    0m38.139s
user    0m36.202s
sys     0m1.949s

on the GOOD commit, and

$ time ./bisect-tool.sh hledger/bench/10000x1000x10.journal  10000
Copying from /home/b/src/haskell/hledger/.stack-work/install/x86_64-linux-nix/fe33b48bc8c8eccb9ebf33418ec92466a0c5b999f9b6ebe098111c0b09206ea9/9.4.4/bin/hledger to
/home/b/.local/bin/hledger.
Copied executables to /home/b/.local/bin/:
* hledger

real    0m38.057s
user    0m36.183s
sys     0m1.870s

on the BAD commit. So, no apparent slowdown.

chreekat avatar Jan 25 '24 19:01 chreekat

But I do see a significant slowdown at that commit in GHCI:

7432769d3:

ghci> withArgs (words "-fexamples/1000x1000x10.journal stats") main
Main file                : /Users/simon/src/hledger/examples/1000x1000x10.journal
Included files           : 
Transactions span        : 2000-01-01 to 2002-09-27 (1000 days)
Last transaction         : 2002-09-26 (7791 days ago)
Transactions             : 1000 (1.0 per day)
Transactions last 30 days: 0 (0.0 per day)
Transactions last 7 days : 0 (0.0 per day)
Payees/descriptions      : 1000
Accounts                 : 1000 (depth 10)
Commodities              : 26 (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z)
Market prices            : 1000 (A)

Run time (throughput)    : 0.61s (1627 txns/s)

76f8eaf91:

ghci> :r
[48 of 88] Compiling Hledger.Read.Common ( hledger-lib/Hledger/Read/Common.hs, interpreted ) [Source file changed]
[87 of 88] Compiling Hledger.Cli      ( hledger/Hledger/Cli.hs, interpreted ) [/Users/simon/src/hledger/.git/index changed]
Ok, 88 modules loaded.
ghci> withArgs (words "-fexamples/1000x1000x10.journal stats") main
Main file                : /Users/simon/src/hledger/examples/1000x1000x10.journal
Included files           : 
Transactions span        : 2000-01-01 to 2002-09-27 (1000 days)
Last transaction         : 2002-09-26 (7791 days ago)
Transactions             : 1000 (1.0 per day)
Transactions last 30 days: 0 (0.0 per day)
Transactions last 7 days : 0 (0.0 per day)
Payees/descriptions      : 1000
Accounts                 : 1000 (depth 10)
Commodities              : 26 (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z)
Market prices            : 1000 (A)

Run time (throughput)    : 0.89s (1128 txns/s)

Next I tried making some unoptimised builds (stack --resolver lts-21 build hledger --ghc-options=-O0). And tried harder to eliminate randomness from this benchmarking method by measuring best of 2 or 3 runs, and testing 3 times. I notice that quickbench uses a ton of memory, and there seems to still be a little bit of randomness in these numbers, but I think overall they are representative.

The builds, whether optimised or not, showed a much smaller slowdown at that commit. I don't know what all of this means.

With 1k txns, for quick results:

~/src/hledger$ just bench -w hledger-preslow-unopt,hledger-slow-unopt,hledger-preslow,hledger-slow -N3 -n3 -f bench1k.sh 
Running quick benchmarks (times are approximate, can be skewed):
Running 3 tests 3 times with 4 executables at 2024-01-25 09:38:33 HST:

Best times 1:
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                           || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+===========================================++=======================+====================+=================+==============+
| -f examples/1000x1000x10.journal print    ||                  0.25 |               0.31 |            0.11 |         0.15 |
| -f examples/1000x1000x10.journal register ||                  0.68 |               0.78 |            0.44 |         0.51 |
| -f examples/1000x1000x10.journal balance  ||                  0.26 |               0.34 |            0.13 |         0.15 |
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Best times 2:
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                           || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+===========================================++=======================+====================+=================+==============+
| -f examples/1000x1000x10.journal print    ||                  0.26 |               0.33 |            0.12 |         0.16 |
| -f examples/1000x1000x10.journal register ||                  0.72 |               0.76 |            0.49 |         0.47 |
| -f examples/1000x1000x10.journal balance  ||                  0.26 |               0.34 |            0.13 |         0.15 |
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Best times 3:
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                           || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+===========================================++=======================+====================+=================+==============+
| -f examples/1000x1000x10.journal print    ||                  0.26 |               0.33 |            0.11 |         0.15 |
| -f examples/1000x1000x10.journal register ||                  0.72 |               0.75 |            0.45 |         0.47 |
| -f examples/1000x1000x10.journal balance  ||                  0.26 |               0.34 |            0.13 |         0.16 |
+-------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Similar results with 10k txns:

~/src/hledger$ just bench -w hledger-preslow-unopt,hledger-slow-unopt,hledger-preslow,hledger-slow -N3 -n2
Running quick benchmarks (times are approximate, can be skewed):
Running 3 tests 2 times with 4 executables at 2024-01-25 09:40:33 HST:

Best times 1:
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                            || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+============================================++=======================+====================+=================+==============+
| -f examples/10000x1000x10.journal print    ||                  2.10 |               2.15 |            0.72 |         0.79 |
| -f examples/10000x1000x10.journal register ||                  6.62 |               6.89 |            3.73 |         4.61 |
| -f examples/10000x1000x10.journal balance  ||                  2.02 |               2.09 |            0.68 |         0.74 |
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Best times 2:
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                            || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+============================================++=======================+====================+=================+==============+
| -f examples/10000x1000x10.journal print    ||                  2.08 |               2.17 |            0.72 |         0.77 |
| -f examples/10000x1000x10.journal register ||                  6.51 |               6.51 |            4.34 |         3.80 |
| -f examples/10000x1000x10.journal balance  ||                  1.98 |               2.09 |            0.69 |         0.74 |
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Best times 3:
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+
|                                            || hledger-preslow-unopt | hledger-slow-unopt | hledger-preslow | hledger-slow |
+============================================++=======================+====================+=================+==============+
| -f examples/10000x1000x10.journal print    ||                  2.12 |               2.24 |            0.74 |         0.76 |
| -f examples/10000x1000x10.journal register ||                  6.81 |               6.63 |            4.05 |         3.91 |
| -f examples/10000x1000x10.journal balance  ||                  2.03 |               2.11 |            0.72 |         0.78 |
+--------------------------------------------++-----------------------+--------------------+-----------------+--------------+

Similar results with GNU time:

~/src/hledger$ gtime hledger-preslow-unopt -f examples/10000x1000x10.journal bal >/dev/null
1.91user 0.05system 0:02.04elapsed 96%CPU (0avgtext+0avgdata 145040maxresident)k
0inputs+0outputs (1599major+7923minor)pagefaults 0swaps
~/src/hledger$ gtime hledger-slow-unopt -f examples/10000x1000x10.journal bal >/dev/null
2.01user 0.05system 0:02.14elapsed 96%CPU (0avgtext+0avgdata 151568maxresident)k
0inputs+0outputs (1619major+8311minor)pagefaults 0swaps

~/src/hledger$ gtime hledger-preslow -f examples/10000x1000x10.journal bal >/dev/null
0.63user 0.05system 0:00.75elapsed 91%CPU (0avgtext+0avgdata 119248maxresident)k
0inputs+0outputs (1582major+6328minor)pagefaults 0swaps
~/src/hledger$ gtime hledger-slow -f examples/10000x1000x10.journal bal >/dev/null
0.69user 0.05system 0:00.82elapsed 91%CPU (0avgtext+0avgdata 127648maxresident)k
0inputs+0outputs (1599major+6836minor)pagefaults 0swaps
~/src/hledger$ gtime hledger-preslow-unopt -f examples/10000x1000x10.journal bal >/dev/null
1.93user 0.05system 0:02.06elapsed 96%CPU (0avgtext+0avgdata 145088maxresident)k
0inputs+0outputs (1599major+7926minor)pagefaults 0swaps
~/src/hledger$ gtime hledger-slow-unopt -f examples/10000x1000x10.journal bal >/dev/null
2.03user 0.05system 0:02.15elapsed 96%CPU (0avgtext+0avgdata 151504maxresident)k
0inputs+0outputs (1619major+8307minor)pagefaults 0swaps

~/src/hledger$ gtime hledger-preslow -f examples/10000x1000x10.journal bal >/dev/null
0.64user 0.05system 0:00.76elapsed 90%CPU (0avgtext+0avgdata 119168maxresident)k
0inputs+0outputs (1582major+6323minor)pagefaults 0swaps
~/src/hledger$ gtime hledger-slow -f examples/10000x1000x10.journal bal >/dev/null
0.69user 0.04system 0:00.80elapsed 92%CPU (0avgtext+0avgdata 127600maxresident)k
0inputs+0outputs (1599major+6833minor)pagefaults 0swaps

simonmichael avatar Jan 25 '24 20:01 simonmichael

(gtime shows that page faults and max residency are a bit higher with the -slow builds. Perhaps increased memory pressure or differences in OS memory management accentuate the impact of this on different systems ?)

simonmichael avatar Jan 25 '24 20:01 simonmichael

In my case, my input data seems to be having an effect. Fwiw, it looks like this:

#########
# Week 79
#########

2024-01-22
----------

09:36:
       Much refreshed this week. I will start with stackage, but I need to make sure to
       take time to stay up to date. I don't think it's good to fall behind like I did the
       last month.

10:14:
       I finished the curator PR https://github.com/commercialhaskell/curator/pull/39,
       which of course was almost already done anyway.

stackage migration  ...

12:00:
       I was nerd sniped by my inbox, looking at GHCup issues. I need to let them go for a
       while now. I am actually not sure what needs to happen next for stackage. Oh, yes,
       discussion of use of super slurper.

org:in  1.75

13:28:
       I was just looking at disk usage on gitlab.haskell.org. I was confused by the
       output of zfs commands, but I think I get it now.

       DF

       # [bryan@gitlab:~]$ df -h
       # Filesystem                    Size  Used Avail Use% Mounted on
       # rpool/safe/root                48G   29G   19G  61% /

       "Size" is a lie, it's just Used + Avail (always). Therefore Use% is also pointless.

And there's 16,000 lines of it. Furthermore,

$ hledger -f timedot:Log.txt  bal|wc
    126     321    5457

$ hledger -f timedot:Log.txt  reg|wc
   9994   79168 3647855

chreekat avatar Jan 25 '24 20:01 chreekat

At least, I can reliably see the speedup in GHCI when the journalMarkRedundantCosts line is commented in journalFinalise:

ghci> :main -fexamples/1000x1000x10.journal stats
...
Run time (throughput)    : 0.91s (1094 txns/s)
ghci> :r
[49 of 90] Compiling Hledger.Read.Common ( hledger-lib/Hledger/Read/Common.hs, interpreted ) [Source file changed]
Ok, 90 modules loaded.
ghci> :main -fexamples/1000x1000x10.journal stats
...
Run time (throughput)    : 0.59s (1708 txns/s)

simonmichael avatar Jan 25 '24 20:01 simonmichael

Yet another benchmarking method. Apologies, but this build up of benchmarking fu is useful...

~/src/hledger$ for i in `seq 1 10`; do hledger-preslow stats -f examples/1000x1000x10.journal |grep throughput; done
Run time (throughput)    : 0.11s (9076 txns/s)
Run time (throughput)    : 0.09s (11374 txns/s)
Run time (throughput)    : 0.09s (11324 txns/s)
Run time (throughput)    : 0.09s (11284 txns/s)
Run time (throughput)    : 0.09s (11297 txns/s)
Run time (throughput)    : 0.09s (11197 txns/s)
Run time (throughput)    : 0.09s (10884 txns/s)
Run time (throughput)    : 0.09s (11392 txns/s)
Run time (throughput)    : 0.09s (11268 txns/s)
Run time (throughput)    : 0.09s (11284 txns/s)
~/src/hledger$ for i in `seq 1 10`; do hledger-slow stats -f examples/1000x1000x10.journal |grep throughput; done
Run time (throughput)    : 0.24s (4229 txns/s)
Run time (throughput)    : 0.13s (7888 txns/s)
Run time (throughput)    : 0.13s (7927 txns/s)
Run time (throughput)    : 0.13s (7914 txns/s)
Run time (throughput)    : 0.13s (7885 txns/s)
Run time (throughput)    : 0.13s (7748 txns/s)
Run time (throughput)    : 0.13s (7940 txns/s)
Run time (throughput)    : 0.13s (7890 txns/s)
Run time (throughput)    : 0.13s (7959 txns/s)
Run time (throughput)    : 0.13s (7932 txns/s)

simonmichael avatar Jan 25 '24 20:01 simonmichael

Hi @simonmichael , I just noticed you used 10000x1000x10 example while I used 10000x10000x10. Maybe slowdown you see is lower because overall time is lower, and it slows down more with bigger jurnals and longer execution times?

I'll check tomorrow on my system and report here.

markokocic avatar Jan 25 '24 20:01 markokocic