cmake-ide
cmake-ide copied to clipboard
Long file-open times on large projects
I'm using cmake-ide with rtage and have a project which generates a large (159M) compile_commands.json file. This seems to be the cause of a minute-plus long delay when opening a new file. Most of the cpu usage is in cmake-ide--on-cmake-finish
, and inside of it, most of the work is in json-read-file
and the delete-dups
inside of cmake-ide--commands-to-hdr-flags
.
Is there anything I can do to mitigate the performance problem?
Failing that, I thought I'd bring the issue to your attention.
That's an enormous compile_commands.json
. I recently added performance improvements to the JSON reading for my own use case, but this is a lot larger. At that size, I'm assuming it's proprietary? I'll see what I can do, but the fact is that Emacs Lisp isn't that fast.
Understood. The project in question is proprietary. Thanks.
I wonder if switching to a binary format would be faster. For example, MongoDb uses BSON (http://bsonspec.org/). There appear to be several Elisp libraries for reading/writing this format:
https://github.com/m2ym/mongo-el/blob/master/bson.el
https://github.com/casualjim/emacs.d/blob/master/elpa/mongo-20120826.14/bson.el
But what CMake supports is the JSON compilation database.
I understand. What I am saying is creating a BSON version of the file. Something that can be read by Elisp a lot more quickly. The JSON file would still have to be read once, but opening new files in Emacs after that should go a lot more quickly.
Interesting
I'd be happy to test new versions when they're ready.
As of this commit cmake-ide
is no longer trying to parse the JSON compilation database every time a project file is opened. I haven't implemented any sort of binary data saving since for my 400k SLOC work project it's fast enough right now. It's only slow the first time a file is opened. All others should be fast. Can you check with your project? If it's fast enough for you it should be fast enough for everyone.
Opening files is still very slow here (project isn't that big).
By stripping out useless flags I managed to get the time down from ~4-5 seconds to ~2-3 seconds. but it's still slow enough to be annoying.
Could this file be converted to an intermediate format that can be parsed faster? (elisp literal for example). Python's parser runs in 0.008 seconds here, so it could be used to convert the json to elisp (as could any other language with a fast json parser).
Or could the file be kept in memory with a time-stamp and only re-read when the time-stamp on disk changes.
There are probably ways around it. The main issue is that elisp isn't that fast and at the time I last tried to tackle this there was no way to write Emacs extensions in C.
Is it possible to cache some more operations? I'm still seeing a noticeable slowness (1-2 seconds to open a file) for my project with about 700 source files and a 2.6 MB compile-commands.json. Normally I'm okay with paying the 2 second penalty, but when I start doing large refactoring (e.g. using projectile-replace across a few dozen files), the 2 second penalty starts to add up.
Running the profiler, I see that the time is spent almost entirely by cmake-ide--set-flags-for-file (from cmake-ide--on-cmake-finished) and the garbage collector. Here's an example of opening a file (after already opened other files in the project). cmake-ide--src-buffers has one file and cmake-ide--hdr-buffers is empty:
- command-execute 943 70%
- call-interactively 943 70%
- funcall-interactively 819 61%
- dired-find-file 612 45%
- find-file 611 45%
- find-file-noselect 603 45%
- find-file-noselect-1 601 44%
- after-find-file 601 44%
- run-hooks 589 44%
- cmake-ide-maybe-run-cmake 573 42%
- cmake-ide--on-cmake-finished 569 42%
- mapc 540 40%
- #<compiled 0x40bc0f65> 540 40%
- cmake-ide--set-flags-for-file 540 40%
- cmake-ide--commands-to-hdr-flags 417 31%
- cmake-ide--delete-dup-hdr-flags 182 13%
- cmake-ide--filter 91 6%
- mapcar 90 6%
- #<compiled 0x41563a67> 90 6%
cmake-ide--dash-i-or-dash-d-p 5 0%
- cmake-ide--flags-filtered 89 6%
- cmake-ide--filter 89 6%
- mapcar 87 6%
- #<compiled 0x4153665b> 85 6%
- #<compiled 0x4146d3f5> 83 6%
cmake-ide--dash-i-or-dash-d-p 9 0%
delete-dups 2 0%
- cmake-ide--args-to-only-flags 115 8%
- cmake-ide--filter 115 8%
- mapcar 55 4%
- #<compiled 0x41522be5> 55 4%
- #<compiled 0x4146d1e5> 55 4%
- cmake-ide--is-src-file 55 4%
- cl-some 5 0%
#<compiled 0x445449e5> 2 0%
- mapcar 101 7%
- cmake-ide--remove-compiler-from-args-string 101 7%
- cmake-ide--split-command 101 7%
- split-string-and-unquote 84 6%
- split-string-and-unquote 65 4%
- split-string-and-unquote 57 4%
- split-string-and-unquote 54 4%
split-string 51 3%
split-string 3 0%
split-string 1 0%
split-string 1 0%
- cmake-ide--filter 19 1%
+ mapcar 15 1%
- cmake-ide--idb-all-commands 84 6%
- mapcar 82 6%
- #<compiled 0x4146871f> 82 6%
- cmake-ide--file-params-to-args 66 4%
- cmake-ide--split-command 64 4%
- split-string-and-unquote 47 3%
- split-string-and-unquote 41 3%
+ split-string-and-unquote 29 2%
split-string 1 0%
+ mapcar 2 0%
s-join 16 1%
+ cmake-ide--idb-all-objs 2 0%
- cmake-ide--set-flags-for-src-file 38 2%
- cmake-ide-set-compiler-flags 37 2%
- cmake-ide--flags-to-include-paths 36 2%
- mapcar 35 2%
- #<compiled 0x4146d36d> 35 2%
- cmake-ide--get-build-dir 35 2%
- cmake-ide--locate-project-dir 33 2%
- cmake-ide--locate-cmakelists 33 2%
- cmake-ide--locate-cmakelists-impl 33 2%
+ cmake-ide--locate-cmakelists-impl 21 1%
+ locate-dominating-file 12 0%
+ cmake-ide--to-simple-flags 1 0%
+ cmake-ide--filter-ac-flags 1 0%
+ cmake-ide--params-to-src-includes 1 0%
cmake-ide--message 1 0%
+ cmake-ide--cdb-json-file-to-idb 15 1%
+ cmake-ide--run-rc 14 1%
+ cmake-ide-maybe-start-rdm 3 0%
+ cmake-ide--need-to-run-cmake 1 0%
+ vc-refresh-state 16 1%
+ normal-mode 12 0%
file-truename 1 0%
+ create-file-buffer 1 0%
+ switch-to-buffer 8 0%
dired-get-file-for-visit 1 0%
+ execute-extended-command 158 11%
+ find-file-at-point 31 2%
+ next-line 13 0%
+ previous-line 3 0%
+ profiler-report-toggle-entry 1 0%
+ dired-next-line 1 0%
+ byte-code 124 9%
- ... 343 25%
Automatic GC 333 24%
+ minibuffer-complete 10 0%
+ timer-event-handler 24 1%
+ redisplay_internal (C function) 22 1%
+ rtags-diagnostics-process-filter 5 0%
It looks like there's a lot of expensive string processing that goes on in translating the single entry for a file's command line into the appropriate entries. However, for a project, the flags for files will often be repeated over a large number of files. For example, in this project of 700 files, there are 8 different modules with distinct command line args, so there are only 8 unique combinations of command line params.
I'm not an elisp expert, but would it be possible to cache the parsed data structure that was derived from the command line args, and store it in a map keyed by the original command line string (minus the source file name)? Then, when entering cmake-ide--set-flags-for-file, you could check the map to see if the set already existed & use that?
A bonus would be that you wouldn't need to throw this list away even if you generated a new compile-commands, since the compilation strings would uniquely determine the rest of the config; in fact, you could share this map across all projects (although it would be unlikely that they would be shared, it would also be unnecessary to sequester them).
@sbroberg Thanks for the analysis. I'll see what I can do when I have time. In the meanwhile you can tune Emacs's GC to have fewer collections. I have this is my init.el
:
(set 'gc-cons-threshold 100000000)