org-ql icon indicating copy to clipboard operation
org-ql copied to clipboard

Delaying mode-hooks when opening Org files for searching

Open Whil- opened this issue 3 years ago • 21 comments

With this change, and when the user customizes org-ql-open-buffers to it's non-default value, org-ql will get a HUGE performance improvement on queries when files aren't already loaded into buffers in Emacs.

This change does not modify any defaults and org-ql will work just as before (and as slow) until org-ql-open-buffers is modified by the user. This is due to backwards compatability.

I've tested the performance on one of my Org roam collections, with 686 files in it. Many small files. In total under 2 MB of text content. The performance with and without this patch is as follows (on Windows 10 PC):

  • Without patch: 3 minutes 11 seconds (191.888988 seconds)
  • With patch: 1.278487 seconds

Test query:

  (org-ql-query
    :select #'(org-get-heading t t t t)
    :from (org-roam--list-all-files)
    :where '(and (property "ID")))

The query lists 507 ID's in total.

The implementation is inspired by how Org ID caching is implemented, and the nroam implementation. I've been involved with the design of both for full disclosure.

Haven't tested this customization very thoroughly, except with running queries on my own machines. But encourage others to try it out and report back if there are any issues!

Whil- avatar Mar 13 '21 20:03 Whil-

You are Gustav Wikstrom, I presume. :)

This is an interesting idea. Is the performance improvement primarily from using delay-mode-hooks? If so, we might be able to apply that separately from using a "work buffer" and gain much of the benefit, but we might have to take care to activate the mode hooks later.

A problem I guess a "work buffer" would have is that, when the results are presented in an org-ql-view buffer, the items' markers would be invalid. That could probably be handled easily enough by replacing the markers with a path-point pair when necessary and updating the relevant code.

Regardless of that, the "work buffer" approach could be useful in, e.g. helm-org-ql, where the results are mostly transient.

Another potential issue with "work buffers" is that some of the Org mode hooks may affect some search results, e.g. setting file-level tags, so delaying those hooks might cause searches to return incorrect results. It might be tricky to account for that.

What do you think? Thanks.

alphapapa avatar Mar 20 '21 22:03 alphapapa

This is an interesting idea. Is the performance improvement primarily from using delay-mode-hooks?

Yes. Haven't profiled the separate parts but that is definitely where the major speedup comes from.

A problem I guess a "work buffer" would have is that, when the results are presented in an org-ql-view buffer, the items' markers would be invalid.

Indeed, one piece of code I have currently tries to look up the filename for each item in the query and that obviously doesn't work with this customization active. Similar incompatability for org-ql-view surely.

Another potential issue with "work buffers" is that some of the Org mode hooks may affect some search results, e.g. setting file-level tags, so delaying those hooks might cause searches to return incorrect results.

Yeah, agreed. The dynamics of Emacs is a bugger and a blessing at the same time. Not sure how to act here really. Maybe still allow for this customization since it's not on by default, and provide a better explanation in the docstring of the risks? It really is a do or die change when the number of files grow large. In that context Org ql is VERY useful with this change, and not so much useful without. And the cases when correctness and mutations (or file lookup) are needed can be catered to case by case (eg. by using let), by temporary changing the customization.

I'm sure more clever people than me can make this integrate better with the rest of Org ql. But wanted to get this out there anyhow due to the huge performance improvement.

Extra remark: I played with the thought to open each file in separate buffers in some kind of delayed mode state. But voted against that because simple read only queries shouldn't leave any (visible) state in Emacs after execution. Running a query on 500+ files doesn't necessarily mean I want to load all those files into my open buffer list. As soon as we leave read-only this breaks down though. But existing functionality isn't changed if all the files that are queried are already loaded into their own buffers.

oh, also couldn't understand why the tests failed on snapshot. But I'm running snapshot Emacs myself now and this change works locally at least.

Whil- avatar Mar 21 '21 00:03 Whil-

Hello, thanks for this PR. I'm interested in it, because when it is merged, I will have to change the code of one of my packages.

Is the performance improvement primarily from using delay-mode-hooks? we might have to take care to activate the mode hooks later.

I've used a similar technique in my org-multi-wiki package, which I think is used exclusively by myself at present. I learned this technique from https://github.com/alphapapa/org-ql/issues/88#issuecomment-570568341 posted by @Kungsgeten. And indeed, the activation issue has existed. Here is a list of workarounds I am using:

  • Use a local variable to track the status of hook run.
  • Always use functions from the package to visit contents in wikis. You have to work around in Helm actions, the org-link follow function, etc.
  • Hide wiki buffers from my buffer list interface, to prevent direct access.
  • ~Add a predicate to recentf-exclude to prevent direct access.~ This is an irrelevant issue.

While the technique definitely improves performance, I'm not sure whether it should be implemented in org-ql. I implemented it in my package which assumes a specific workflow, but to enable it globally in Org, it will be even a harder path. You will probably need something like org-ql-global-mode. On the other hand, once we implement the optimization as part of org-ql, front-end developers, i.e. people who develop packages that utilize org-ql, will no longer have to work around the activation problem in individual packages.

Another potential issue with "work buffers" is that some of the Org mode hooks may affect some search results, e.g. setting file-level tags, so delaying those hooks might cause searches to return incorrect results. It might be tricky to account for that.

I was unaware of this issue, and the global minor mode may not handle this case properly.

akirak avatar Mar 21 '21 02:03 akirak

This is an interesting idea. Is the performance improvement primarily from using delay-mode-hooks?

Yes. Haven't profiled the separate parts but that is definitely where the major speedup comes from.

Is it because of font-lock by any chance? If so, the problem may be fixed in org-mode itself in reasonable future. I have a working patch in https://github.com/yantar92/org.

yantar92 avatar Mar 21 '21 05:03 yantar92

I learned this technique from #88 (comment) posted by @Kungsgeten. And indeed, the activation issue has existed.

Yeah, definitely credit to @Kungsgeten for all of my code as well. Org-brain was first out with this optimization! 👌👍

Whil- avatar Mar 21 '21 07:03 Whil-

Is it because of font-lock by any chance? If so, the problem may be fixed in org-mode itself in reasonable future. I have a working patch in https://github.com/yantar92/org.

I doubt that font-lock is the root to the problem here. Kind of hoping to be proven wrong though!

Looking forward to seeing your font-lock code in Org none the less. As any improvement to Orgs performance is a huge win!

Whil- avatar Mar 21 '21 07:03 Whil-

I doubt that font-lock is the root to the problem here. Kind of hoping to be proven wrong though!

Font-locking in org means that all the links have to be parsed for example. That involves a lot of regex search. In my org files, font-locking took nearly half of the startup time (also thanks to pretty-symbols). In any case, I would be curious to know about startup bottlenecks in your scenario.

Looking forward to seeing your font-lock code in Org none the less. As any improvement to Orgs performance is a huge win!

Any feedback would be appreciated. That patch is complex and font-locking staff is just one of the optimisations.

yantar92 avatar Mar 21 '21 07:03 yantar92

I doubt that font-lock is the root to the problem here. Kind of hoping to be proven wrong though!

I did a benchmark from your branch @yantar92. No performance improvement. Slower actually. Output of the tests: (Run on different hardware than above, hence the difference in time from first post. This test: high-end AMD desktop. Tests in first post: virtual machine on high-end Intel laptop.)

With org-mode master:
Elapsed time: 1.016928s (0.102247s in 3 GCs)
Elapsed time: 115.648919s (3.886891s in 85 GCs)
Elapsed time: 0.726889s (0.054811s in 1 GCs)

With yantar92/org-fold-universal-core:
Elapsed time: 1.184400s (0.213159s in 7 GCs)
Elapsed time: 116.078592s (4.321491s in 70 GCs)
Elapsed time: 0.770061s (0.094068s in 1 GCs)

First row is with new customization on. Second with it off (showing current performance) and third is with all files already loaded into buffers.

I can note that benchmarking org-roam--list-all-files shows that's actually a significant piece of the time spent already there:

(benchmark 1 '(org-roam--list-all-files))

results:

Elapsed time: 0.390924s

These numbers are with my personal setup though, Emacs 28.0.50 on Windows 10 with native compilation on. Surely they can differ in another persons config but I suspect the relative performance difference is around the same. (i.e. around 100x faster with this customization)

Sidenote: Got an error with the org-fold-universal-core branch. error("No match %d in highlight %S" 2 (2 'org-headline-done prepend)) Can discuss that offline if needed, to not take over this pull-request.

Whil- avatar Mar 21 '21 12:03 Whil-

I did a benchmark from your branch @yantar92. No performance improvement. Slower actually. Output of the tests:

Clearly, font-lock is not the coolprit for you.

Can you try to test a little differently to get more information:

  1. M-x profiler-start <RET> cpu <RET>
  2. Run your test
  3. M-x profiler-report <RET>
  4. Expand the most time-consuming functions with tab and show here.

yantar92 avatar Mar 21 '21 12:03 yantar92

I did a benchmark from your branch @yantar92. No performance improvement. Slower actually. Output of the tests:

Clearly, font-lock is not the coolprit for you.

Can you try to test a little differently to get more information:

  1. M-x profiler-start <RET> cpu <RET>
  2. Run your test
  3. M-x profiler-report <RET>
  4. Expand the most time-consuming functions with tab and show here.

cpu profiling doesn't give many clues, sorry to say. The following code is run two times, with only org-ql-open-buffers changing from t to nil:

(setq gw/files (org-roam--list-all-files)) ;only run in first test

(let ((org-ql-open-buffers t))
  (benchmark 1 '(org-ql-query
                  :select #'(org-get-heading t t t t)
                  :from gw/files
                  :where '(and (property "ID")))))

And the results:

With org-mode master, no customization, all org-mode-hooks removed:
Elapsed time: 113.843394s (3.958201s in 92 GCs)

          24  64% - command-execute
          24  64%  - funcall-interactively
          24  64%   - eval-last-sexp
          24  64%    - elisp--eval-last-sexp
          23  62%     - setq
          23  62%      - org-roam--list-all-files
          23  62%       + org-roam--list-files
           1   2%       let
          13  35% - ...
          10  27%  - benchmark-run
          10  27%   - let
          10  27%    - list
          10  27%     - benchmark-elapse
          10  27%      - let
          10  27%       - org-ql-query
          10  27%        - org-ql-select
          10  27%         - #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_11>
          10  27%          - find-file-noselect
          10  27%           - find-file-noselect-1
          10  27%            - after-find-file
          10  27%             - normal-mode
          10  27%              - set-auto-mode
          10  27%               - set-auto-mode-0
          10  27%                - org-mode
          10  27%                 - org-load-modules-maybe
           6  16%                  - require
           6  16%                   - byte-code
           4  10%                    - require
           4  10%                     - byte-code
           1   2%                      - require
           1   2%                         byte-code
           1   2%                      - custom-declare-variable
           1   2%                       - custom-initialize-reset
           1   2%                        - eval
           1   2%                         - funcall
           1   2%                          - #<compiled -0x1219b6703575d3ee>
           1   2%                             executable-find
           3   8%    Automatic GC


With org-mode master, no customization, all org-mode-hooks removed:
Elapsed time: 0.676947s (0.107029s in 3 GCs)

          35  83% - command-execute
          35  83%  - funcall-interactively
          34  80%   - eval-last-sexp
          34  80%    - elisp--eval-last-sexp
          34  80%     - let
          34  80%      - benchmark
          34  80%       - benchmark-run
          34  80%        - let
          34  80%         - list
          34  80%          - benchmark-elapse
          34  80%           - let
          34  80%            - org-ql-query
          34  80%             - org-ql-select
          24  57%              - #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_11>
          18  42%               - find-buffer-visiting
          10  23%                - file-truename
           4   9%                   file-truename
           8  19%                #<compiled 0xe8616e256fbcfaf>
           1   2%   - previous-line
           1   2%      line-move
           7  16% - ...
           7  16%    Automatic GC

Whil- avatar Mar 21 '21 17:03 Whil-

cpu profiling doesn't give many clues, sorry to say.

The profiler report looks very suspicious.

62% + org-roam--list-files

Did it really take over one minute to run this? Looking at the org-roam's source code, this functions just finds all org files in directory. It should not take that much time.

27% - org-load-modules-maybe

20 seconds to load modules? Also very suspicious.

I suspect that you somehow executed profiler-report multiple times. If you did it, what you saw was only the profiling between last two runs of the profiler-report.

yantar92 avatar Mar 21 '21 23:03 yantar92

Did it really take over one minute to run this? Looking at the org-roam's source code, this functions just finds all org files in directory. It should not take that much time.

Yes it does take that time. Probably not CPU-bound. It can be an issue related more to Windows than Linux though. Not sure how multi-file Org ql queries behave on Linux. But on Windows 10 (on a pretty beefed up hardware spec as well!) the numbers reported above is what I see. No fabrication. Would be good to have a public set of files to run the same test on, to make it reproducible.. If you don't have access to a Windows config then you have to trust me for now.

20 seconds to load modules? Also very suspicious.

I think you're mixing up CPU time with real time. The profiler doesn't seem to capture both.

I suspect that you somehow executed profiler-report multiple times. If you did it, what you saw was only the profiling between last two runs of the profiler-report.

I've started each test with (profiler-start), after execution ran (profiler-stop) and (profiler-report) and between tests (profiler-reset). Not sure if this is best practice. But I agree that the profiling does look odd...

As I've understood from previous use of (benchmark n #'function) the time in parenthesis represents CPU time of sort and the time before is real time. So a lot of time where the CPU is simply waiting. I'm not 100% sure if this is the correct interpretation of the benchmark-numbers.

Whil- avatar Mar 21 '21 23:03 Whil-

I don't mind "hosting" discussion relevant to improving performance of Org-related code on my repo, but it does make it difficult to know where the discussion specific to this PR stands and what should be done next for it.

@Whil- Would you please write some kind of status update about this PR's code and how it relates to org-ql specifically?

For benchmarking, I generally recommend using these macros I wrote: https://github.com/alphapapa/emacs-package-dev-handbook#bench-multi-lexical They present results in a helpful way, and they help ensure that the code is doing the same thing each time it's run (several times it's caught mistakes of mine that would have invalidated the results). They also ensure that lexical-binding is used, which can affect the results.

Also, while I'm very enthusiastic about the native-comp branch, of course, I'm not sure if it's a good idea to use it when benchmarking packages, especially if different platforms or users are involved.

BTW, please see this old feature branch in org-rifle, which attempts some similar optimizations. I haven't touched it in a few years, and I don't remember its status or why I hadn't merged it yet (I might have notes about it somewhere). https://github.com/alphapapa/org-rifle/commit/ad4cdf8e241cea0d4d11e480ac2b46f0c7306e1f

@akirak I don't aim to break any APIs that would require changes in your packages. If some of the ideas in this PR make it into org-ql, they should probably not be used by default, and they should probably be activated by let-binding a variable around relevant code, and the developer who does so should know what he's doing (so we would need to document the feature carefully).

alphapapa avatar Mar 22 '21 00:03 alphapapa

@Whil- Would you please write some kind of status update about this PR's code and how it relates to org-ql specifically?

The same status as in the first post. I.e. I propose a configuration that allows the user to run org-ql without opening buffers for all the files that aren't already in buffers. Using a hidden work-buffer as described above. Two reasons for the patch:

  1. SIGNIFICANT speedup!
  2. To allow Org-ql to run queries on files without leaving open buffers after execution finishes.

For (1), the caveat is that I've only tested it on Windows, where the performance is more than 100 times faster. I suspect Linux to also be faster but maybe not by as much as 100 times.

Also, while I'm very enthusiastic about the native-comp branch, of course, I'm not sure if it's a good idea to use it when benchmarking packages, especially if different platforms or users are involved.

I do agree... But setting up Emacs again and separately for this performance test was not an option. I hoped that the initial benchmark showing the difference would be enough. In the best of worlds performance tests are included in your build scripts of course, but even that will probably not capture all the relevant variations here, as I doubt Github allows automating windows builds!?

Anyhow, the above is the latest status. Hope it clarifies something!

Whil- avatar Mar 22 '21 01:03 Whil-

@Whil- I meant the latest status with regard to the issues I mentioned, which would have to be resolved before this could be merged. If you don't plan to work on them yourself, then this PR probably can only serve as proof-of-concept for future work.

alphapapa avatar Mar 22 '21 01:03 alphapapa

@Whil- I meant the latest status with regard to the issues I mentioned, which would have to be resolved before this could be merged.

Oh, didn't catch that. Sorry. You mean the interplay with Org-ql-view then. And the issues due to hooks not being run. I'm not using the view so not sure how involved that change would be. First version solution could maybe be to override the new user customization to always make sure files are read into buffers? Then raw queries can get a speedup without affecting the rest? And the rest can be iterated upon in steps later on, when inspiration comes?

The hooks issue, I have no idea how to solve that, so my time would be wasted in trying to fix it. I see it as a trade off for now. 100x performance boost for 90% of functionality, or 100% functionality and no boost.

I understand if you consider this to low quality for inclusion. In such case, no hard feelings of you choose to close the pr. Personally I can still use my own fork.

Whil- avatar Mar 22 '21 02:03 Whil-

First version solution could maybe be to override the new user customization to always make sure files are read into buffers? Then raw queries can get a speedup without affecting the rest? And the rest can be iterated upon in steps later on, when inspiration comes?

I'm not sure exactly what you mean.

The hooks issue, I have no idea how to solve that, so my time would be wasted in trying to fix it. I see it as a trade off for now. 100x performance boost for 90% of functionality, or 100% functionality and no boost.

I'm afraid it's a matter of correctness: this change would probably cause some searches to return incorrect results, which would be equivalent to something like 0% functionality. False negatives are the worst bug a search tool can have, and an untrustworthy tool is useless.

I understand if you consider this to low quality for inclusion. In such case, no hard feelings of you choose to close the pr. Personally I can still use my own fork.

It's not that it's low-quality, but that it needs more work to be viable.

The use-case of searching files that are not yet open in Emacs buffers is an important one, and if its performance can be improved, it should be. But due to the issues mentioned, it would have to be done carefully. Why don't you open an issue so we can discuss it in detail.

(As an aside, I'd eventually like to integrate org-ql with an indexer for searching unopened files without opening them, which might obviate this issue to an extent. It's tricky, though, because of some of the things Org does automatically when opening Org files, which the indexer would have to be careful to do the equivalent of, to ensure that the same results are returned for both opened and unopened files.)

alphapapa avatar Mar 22 '21 02:03 alphapapa

As I've understood from previous use of (benchmark n #'function) the time in parenthesis represents CPU time of sort and the time before is real time. So a lot of time where the CPU is simply waiting. I'm not 100% sure if this is the correct interpretation of the benchmark-numbers.

Regarding the profiling, I think @yantar92 is right that those results don't look right.

AIUI the issue is not with org-ql but with Org itself and the things it does when activating org-mode in a buffer. You should probably benchmark a simple function that opens all of those Org files with and then without delaying the mode hooks. Then, assuming the benchmark results are as expected, you should probably choose a relatively large Org file (large enough that the timings won't be merely noise) and profile activating org-mode in just that file's buffer, with and then without delaying the mode hooks.

Assuming our intuitions are right, that should show what exactly Org is doing that takes so much time. And then, maybe some of those specific things could be optimized in Org. And from org-ql's perspective, maybe some of those individual operations could be omitted or delayed when opening files during a search, because I think delaying all of the mode hooks is too blunt an instrument.

alphapapa avatar Mar 22 '21 02:03 alphapapa

.... I think delaying all of the mode hooks is too blunt an instrument.

I also think so. Instead of intrusive delay-mode-hook, Org mode provides org-inhibit-startup that can be let-bound around org-mode call. It is used in Org mode iteself to open archive files faster.

More specific to the @Whil- results, there is org-modules-loaded that can be let-bound to t to suppress loading modules. Though it has similar (though smaller) risks in comparison with delay-mode-hooks.

yantar92 avatar Mar 22 '21 02:03 yantar92

First version solution could maybe be to override the new user customization to always make sure files are read into buffers? Then raw queries can get a speedup without affecting the rest? And the rest can be iterated upon in steps later on, when inspiration comes?

I'm not sure exactly what you mean.

What I meant was that the customization, though blunt, still has it's use and as a first step in implementing a speedup could be to always override the value of org-ql-open-buffers when correctness is absolutely necessary. Writing this I see how that's probably not good enough though. So scratch that idea.

The use-case of searching files that are not yet open in Emacs buffers is an important one, and if its performance can be improved, it should be. But due to the issues mentioned, it would have to be done carefully. Why don't you open an issue so we can discuss it in detail.

While I'd also like to dig in to the details of this and do it better I'm not sure of what I can contribute. High risk of this becoming a real time sink, something I'm not prepared to go into right now. The "catch all" disable events solution may be to blunt to be included in Org ql, even with a disclaimer. But I'll continue to use it anyhow since there is no real option. I'll just have to stay aware of the corner cases.

AIUI the issue is not with org-ql but with Org itself and the things it does when activating org-mode in a buffer. You should probably benchmark a simple function that opens all of those Org files with and then without delaying the mode hooks. Then, assuming the benchmark results are as expected, you should probably choose a relatively large Org file (large enough that the timings won't be merely noise) and profile activating org-mode in just that file's buffer, with and then without delaying the mode hooks.

Assuming our intuitions are right, that should show what exactly Org is doing that takes so much time. And then, maybe some of those specific things could be optimized in Org. And from org-ql's perspective, maybe some of those individual operations could be omitted or delayed when opening files during a search, because I think delaying all of the mode hooks is too blunt an instrument.

I'm not even sure it's Org that is the reason. It could just as well be Emacs itself. Actually, to get a bit more evidence I ran a bunch of performance tests on only loading files into emacs. I won't go through all results but some interesting findings:

  1. There must be something messed up with my systems! Because moving the whole set of files from my "main" location to a temporary location changed the loading time quite a lot... around 4x. From ~60s to ~14s. Suspecting my file-sync services here...
  2. Changing files to .txt (and thus not load org-mode) reduced loading times 2x, from 14s to around 8s
  3. Using a temp buffer to, in sequence, load each file into the temp buffer, replacing the content for each file, reduced the processing to 0.1s.

Tests:

(benchmark 1 '(mapc #'find-file-noselect (directory-files "c:/PIM/Notes/" t ".org$"))) ; Sooo slow
(benchmark 1 '(mapc #'find-file-noselect (directory-files "c:/tmp/Notes/" t ".org$"))) ; 4x faster!? wtf?
(benchmark 1 '(mapc #'find-file-noselect (directory-files "c:/PIM/Notes/" t ".txt$"))) ; 2x faster than previous test

(with-temp-buffer
  (org-mode)
  (benchmark 1 '(mapc #'(lambda (file) (insert-file-contents file nil nil nil 'replace))
                 (directory-files "c:/tmp/Notes/" t ".txt$")))) ; 80x faster than previoius test

To conclude: No the performance issue does not lie with org-ql. The patch lets org-ql be agnostic to the loading files into buffers performance issue. But shortcuts taken in relation to delaying mode hooks may be to blunt to allow this into the code anyhow.

Whil- avatar Mar 22 '21 20:03 Whil-

@Whil- I tried your benchmark on my Org folder with 36 large org files. I do not have file-system issues and use SSD, so I was able to progressively disable the most time-consuming org-mode functions. The benchmark time reduced from 5.5sec to 0.4sec. The benchmark without org-mode-restart call takes 0.2sec. The benchmark also without org-mode call at the beginning takes 0.19sec.

I used the following benchamrk:

(with-temp-buffer
  (org-mode)
  (benchmark 1 '(mapc #'(lambda (file)
                          (with-silent-modifications
			    (insert-file-contents file nil nil nil 'replace))
                          (cl-letf ((inhibit-message t)
                                    (org-inhibit-startup t)
                                    (org-mode-hook nil)
                                    ((symbol-function 'org-macro-initialize-templates) (lambda () nil))
                                    ((symbol-function 'org-update-radio-target-regexp) (lambda () nil))
                                    ;; org--collect-keywords-1 should not be suppressed
                                    ((symbol-function 'org--set-faces-extend) (lambda (&rest _) nil))
                                    )
	                    (org-mode-restart)))
                      (directory-files "~/Org/" t ".org$"))))

The most time-consuming components are the following:

  1. org-mode-hook (in my case, it is because of slow org-ref-setup-label-finders). Though it must not be disabled without user consent.
  2. org-macro-initialize-templates, org-update-radio-target-regexp, and org--set-faces-extend: should be irrelevant to org-ql
  3. org--collect-keywords-1. This one should not be suppressed as it contributes to in-buffer org-mode settings. For example, this function process "SETUP" directives.

yantar92 avatar Mar 23 '21 13:03 yantar92