helm-bibtex
helm-bibtex copied to clipboard
Large difference in speed when using one notes file versus many notes files
I noticed that the first call to helm-bibtex
was always quite slow and decided to investigate. Long story short, the call to bibtex-completion-prepare-entry
was taking a long time. It turns out that this snippet (part of bibtex-completion-prepare-entry
) was to blame:
(entry (if (or
;; One note file per entry:
(and bibtex-completion-notes-path
(f-directory? bibtex-completion-notes-path)
(f-file? (f-join bibtex-completion-notes-path
(s-concat entry-key
bibtex-completion-notes-extension))))
;; All notes in one file:
(and bibtex-completion-notes-path
(f-file? bibtex-completion-notes-path)
(with-current-buffer (find-file-noselect bibtex-completion-notes-path)
(save-excursion
(save-restriction
(widen)
(goto-char (point-min))
(re-search-forward (format bibtex-completion-notes-key-pattern (regexp-quote entry-key)) nil t))))))
(cons (cons "=has-note=" bibtex-completion-notes-symbol) entry)
entry))
Since I had one large file, I hit the second part (all notes in one file). The regex search is slow, especially when it has to do this 1000 times to parse every entry in the bib
file. I noticed that the other code looks like it would be faster and decided to convert my large org file into many small org files. There's not a built-in way to do this though, so I wrote the following function:
(defun my/prop-to-file ()
"Take the Custom_ID property and make a new file with just the entry"
(let* ((props (org-entry-properties (point) "Custom_ID"))
(id (cdr (assoc "CUSTOM_ID" props)))
(filename (concat "/path/to/new/notes/directory/" id ".org")))
(org-copy-subtree)
(with-temp-file filename (yank))))
This assumes that your org file has note entries setup where the Custom_ID
property is the bib key. I then opened up my bib file, widened the org buffer, put point on the first heading, and recorded a keyboard macro that applies this function then moved to the next heading. Then I applied that macro to all the headings in the buffer (which took a while).
After changing the relevant variables (like bibtex-completion-notes-path
), the first call to helm-bibtex
is about 5 or 7 times faster. It's quite noticible.
Dropping this note here with the hope that maybe it helps someone else.
Thank you very much for sharing the results of your investigation. The algorithm for finding notes in indeed rather naive. In my setup, initial parsing is reasonably fast (3s), and for that reason I never improved the code. I guess in the single-file case it would be better to first scan the notes file for entries and to build a look-up table that can later be used to annotate the entries. A variant of this approach may also speed up loading in the multi-file setup, not sure.
Also, instead of searching the entire buffer (as it does now), maybe we can use org-element-parse-buffer to parse only the headings. This should make loading faster. Here's a test:
(defun keys ()
(with-temp-buffer
(insert-file-contents bibtex-completion-notes-path)
(let ((tree (org-element-parse-buffer 'headline)))
(org-element-map tree 'headline
(lambda (id) (org-element-property :CUSTOM_ID id))))))
(helm :sources (helm-build-sync-source "Notes keys"
:candidates (lambda () (reverse (keys))))
:buffer "*helm sync source*")
Then with a cached list of the results/keys, we can use the member function to check if we have ENTRY-KEY for notes.
This is what I meant: https://github.com/jagrg/helm-bibtex/commit/bed65841034ef4bfaa6e01920f5b51eb5dc23a11
Did you test how much faster this is compare to the current code?
One potential improvement: Wouldn't it be faster and conserve more memory resources to recycle the notes buffer if it already exists (as the existing code does)?
On 16 Feb 2018, Titus von der Malsburg [email protected] wrote:
Did you test how much faster this is compare to the current code?
About 1.5 seconds faster using the elp library. That's 4 and 2.5 seconds, respectively.
One potential improvement: Wouldn't it be faster and conserve more memory resources to recycle the notes buffer if it already exists (as the existing code does)?
In terms of speed, I don't think there's much difference between (find-file-noselect bibtex-completion-notes-path) and (insert-file-contents bibtex-completion-notes-path) if that's what you mean.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*
-- Jonathan Monthly Atmospheric CO₂: January 2018: 407.98, January 2017: 406.13
In terms of speed, I don't think there's much difference between (find-file-noselect bibtex-completion-notes-path) and (insert-file-contents bibtex-completion-notes-path) if that's what you mean.
No difference is expected when the notes buffer isn't already loaded, but if it is loaded, the new code loads the file into a temp buffer when it could just reuse the existing buffer. Perhaps the difference is too small to matter?
Could you please make a PR? Thank you.
I think the difference is too small.