ledger-mode Completion using 100% CPU for 5+ seconds

After the latest update of all dependencies Tab completion in ledger-mode takes more than 5 seconds to display anything after hitting TAB.

Top shows that during the time emacs process is topping to 100% CPU.

My configuration looks like the following:

(use-package ledger-mode
  :ensure t :defer t
  :init (add-hook 'ledger-mode-hook
                  (lambda ()
                    (setq-local ledger-post-amount-alignment-column nil)
                    (setq-local tab-always-indent 'complete)
                    (setq-local completion-cycle-threshold t)
                    (setq-local ledger-complete-in-steps nil) ; or t
                    (end-of-buffer))))

Jan 08 '20 13:01 markokocic

After profiling using emacs profiler it looks like the culprit is ledger-accounts-in-buffer function whici is responsible for exessive memory allocation (130MB as reported by emacs profiler) and 50 % of CPU. The rest of the CPU is spent in GC, probably due to the high memory consumption of the function. My transaction file is ~950 KB long.

Note that everything was working fine (performance wise) up until recently.

Jan 08 '20 17:01 markokocic

To reply to myself, I did some tinkering and reverted the ledger-accounts-list-in-buffer to avoid calling the new ledger-accounts-in-buffer function.

(defun ledger-accounts-list-in-buffer ()
  "Return a list of all known account names in the current buffer as strings.
Considers both accounts listed in postings and those declared with \"account\" \
directives."
  (save-excursion
    (goto-char (point-min))
    (let (results)
      (while (re-search-forward ledger-account-name-or-directive-regex nil t)
        (setq results (cons (match-string-no-properties 1) results)))
      (sort (delete-dups results) #'string-lessp))))

After redefining the function, Tab completion is fast again. Memory usage decreased 10 times, back to 12MB in my case, and CPU to subseconds without spending any time in GC.

I will not submit a patch here since I suppose that the goal is not to revert to the old implementation, but to make the new implementation fast.

Jan 08 '20 17:01 markokocic

Thanks for taking the time to dig into this. Hey @jabranham, I think this is probably related to your changes in 5761883b09e0ec79188f7d037a7ff611239210f7?

Jan 08 '20 23:01 purcell

@markokocic Thanks for the bug report. Do you define your accounts with account directives or not? Can you share the full report of the profiler?

Jan 09 '20 01:01 jabranham

@jabranham I don't define any accounts in ledger file.

Attached are the reports created by profiler-start and profiler-report. The numbers are not always the same, since it's not always clear when and how much GC will trigger, but they are similar.

profiler-cpu.txt profiler-memory.txt

After switching to the old implementation from before 5761883 performance is again acceptable.

Jan 09 '20 09:01 markokocic

After playing a bit more in it, it looks like problems are caused by using member to check if account is already in the list of account.

It appears it's much faster to just add everything and then remove duplicates from the list.

 (defun ledger-accounts-in-buffer ()
  (save-excursion
    (goto-char (point-min))
    (let (account-list)
      ;; First, consider accounts declared with "account" directives, which may or
      ;; may not have associated data. The data is on the following lines up to a
      ;; line not starting with whitespace.
      (while (re-search-forward ledger-account-directive-regex nil t)
    (let ((account (match-string-no-properties 1))
          (lines (buffer-substring-no-properties
              (point)
              (progn (ledger-navigate-next-xact-or-directive)
                 (point))))
          data)
      (dolist (d (split-string lines "\n"))
        (setq d
          ;; TODO: This is basically (string-trim d) but string-trim
          ;; doesn't exist in Emacs 24. Replace once we drop Emacs 24.
          (if (string-match "[[:space:]]+" d)
              (substring d (match-end 0))
                      d))
        (unless (string= d "")
          (if (string-match " " d)
          (push (cons (substring d 0 (match-beginning 0))
                  (substring d (match-end 0) nil))
            data)
                  (push (cons d nil) data))))
      (push (cons account data) account-list)))
      ;; Next, gather all accounts declared in postings
      (unless
      ;; FIXME: People who have set `ledger-flymake-be-pedantic' to non-ni l
      ;; probably don't want accounts from postings, just those declared
      ;; with directives.  But the name is a little misleading.  Should we
      ;; make a ledger-mode-be-pedantic and use that instead?
      (bound-and-true-p ledger-flymake-be-pedantic)
    (goto-char (point-min))
    (while (re-search-forward ledger-account-name-or-directive-regex nil t)
      (let ((account (match-string-no-properties 1)))
        (setq account-list (cons (cons account nil) account-list))))
    ;; (unless (member account (mapcar #'car account-list))
    ;; (push (cons account nil) account-list)))))
    ;; (sort account-list (lambda (a b) (string-lessp (car a) (car b)))))))
    (sort (delete-dups account-list) (lambda (a b) (string-lessp (car a) (car b ))))))))

This ist still a bit slower then the old implementation, but it's close enough.

Jan 09 '20 10:01 markokocic

Makes sense. I've applied that change in a9232257.

Jan 09 '20 21:01 purcell

Now it's better, but still not as fast as before.

18e419abf4f introduces the noticeable lag again. It looks like cl-delete-duplicates is much slower than delete-dups

Jan 10 '20 08:01 markokocic

Ah okay, but your change alone broke the unit tests (see here) because the deduplication was done on the full cons in the list, not the first element of the cons, ie. the account name. I wonder if there's a formulation which gives the correct behaviour with the speed of delete-dups. Perhaps in seq...

Jan 10 '20 09:01 purcell

Hi @purcell, good catch.

On the other hand, I don't see why ledger-accounts-in-buffer returns a list of conses instead of simple list of account name strings.

The only place in ledger-mode where ledger-accounts-in-buffer is used is inside ledger-accounts-list-in-buffer function, and there only account name list is extracted fro the cons using car and the rest is ignored.

All this could be easily trimmed to only collect and store account names in the account list, and avoid extra list manipulation and allocation.

Current implementation makes sense only if you consider building up additional functionality that requires additional account data, but even then I would argue that having fast implementation for completion purpose would warrant maintaining separate "fast" and "full" account extraction functions.

Jan 10 '20 09:01 markokocic

Current implementation makes sense only if you consider building up additional functionality that requires additional account data

Yes, that's the intent if e.g. you want to filter out accounts based on some criteria.

I should have time this weekend to look into this some more. Hopefully there's a way to speed up the implementation. I wonder why it's so slow on your machine. Do you have a very old machine or a lot of accounts, maybe?

Sorry for closing, hit wrong button.

Jan 10 '20 12:01 jabranham

hi @jabranham

The speed issues comes for me on multiple machines, even on my 16 GB RAM, 4 Core i5 CPU. I keep everything in one file, it has ~350 accounts, ~2500 payees and 10000 transactions. Not huge, but big enough to notice the difference.

Jan 10 '20 12:01 markokocic

@jabranham @purcell

I tried to play a little bit more with it. I switched to hash-map for account list instead of using a list of conses. This was much faster, but still not as sleek as the original version of just using list of strings.

For a reference I'm pasting the code here if you want to explore the idea.

(require 'subr-x) ; Emacs 24.4+, used for hash-table-keys

(defun ledger-accounts-in-buffer ()
  (save-excursion
    (goto-char (point-min))
    (let ((account-list (make-hash-table :size 500)))
      (while (re-search-forward ledger-account-directive-regex nil t)
        (let ((account (match-string-no-properties 1))
              (lines (buffer-substring-no-properties
                      (point)
                      (progn (ledger-navigate-next-xact-or-directive)
                             (point))))
              data)
          (dolist (d (split-string lines "\n"))
            (setq d
                  (if (string-match "[[:space:]]+" d)
                      (substring d (match-end 0))
                    d))
            (unless (string= d "")
                (if (string-match " " d)
                  (push (cons (substring d 0 (match-beginning 0))
                              (substring d (match-end 0) nil))
                        data)
                (push (cons d nil) data))))
          (puthash account data account-list)))

    ;; Next, gather all accounts declared in postings
    (unless
        (bound-and-true-p ledger-flymake-be-pedantic)
      (goto-char (point-min))
      (while (re-search-forward ledger-account-name-or-directive-regex nil t)
        (let ((account (match-string-no-properties 1)))
          (puthash account t account-list))))
    account-list)))

(defun ledger-accounts-list-in-buffer ()
  (let ((accounts (hash-table-keys (ledger-accounts-in-buffer))))
    (when ledger-accounts-exclude-function
      (setq accounts (cl-remove-if ledger-accounts-exclude-function accounts)))
    (sort accounts #'string-lessp))) ; not sure if sorting is required here

Also, adding some kind of per buffer caching of the accounts could almost completely mitigate the issues, since cache would need to be created once when loading file, and updated when file saved (or transaction edited/added/removed).

Jan 10 '20 13:01 markokocic

I pushed a commit that speeds up account completion ~50% on my end. I suspect that's still not fast enough for you, but let me know how it performs.

I'd rather avoid hash tables and/or caching if we can, but if we need to I guess we can move to that.

Jan 11 '20 13:01 jabranham

Looks like the performance is now dominated by cl--adjoin.

After this change I can see in Emacs profiler the performance of ~1308 CPU samples and ~20 MB memory usage, most of it allocated also by cl--adjoin.

Hashtable implementation takes ~500 CPU samples and ~32 MB memory.

The old implementation (list of strings) is still by far the best, ~300 CPU samples and ~13 MB memory allocation

Jan 13 '20 09:01 markokocic

On the other hand, hitting TAB once, for one completion triggers 4 calls to ledger-accounts-list-in-buffer which in a way triggers ledger-accounts-in-buffer 4 times.

Removing the unnecessary calls to the function may give 4x times perf improvement.

Jan 13 '20 09:01 markokocic

hitting TAB once, for one completion triggers 4 calls to ledger-accounts-list-in-buffer

Hmm, that sounds sub-optimal! I had a look at the code and it wasn't obvious why that might be.

Jan 14 '20 07:01 purcell

It's been a while since I use ledger-mode without any modification and don't have any performance issues.

Jan 10 '23 17:01 markokocic