helm-bibtex icon indicating copy to clipboard operation
helm-bibtex copied to clipboard

Large difference in speed when using one notes file versus many notes files

Open jabranham opened this issue 6 years ago • 7 comments

I noticed that the first call to helm-bibtex was always quite slow and decided to investigate. Long story short, the call to bibtex-completion-prepare-entry was taking a long time. It turns out that this snippet (part of bibtex-completion-prepare-entry) was to blame:

(entry (if (or
                       ;; One note file per entry:
                       (and bibtex-completion-notes-path
                            (f-directory? bibtex-completion-notes-path)
                            (f-file? (f-join bibtex-completion-notes-path
                                             (s-concat entry-key
                                                       bibtex-completion-notes-extension))))
                       ;; All notes in one file:
                       (and bibtex-completion-notes-path
                            (f-file? bibtex-completion-notes-path)
                            (with-current-buffer (find-file-noselect bibtex-completion-notes-path)
                              (save-excursion
                                (save-restriction
                                  (widen)
                                  (goto-char (point-min))
                                  (re-search-forward (format bibtex-completion-notes-key-pattern (regexp-quote entry-key)) nil t))))))
                      (cons (cons "=has-note=" bibtex-completion-notes-symbol) entry)
                    entry))

Since I had one large file, I hit the second part (all notes in one file). The regex search is slow, especially when it has to do this 1000 times to parse every entry in the bib file. I noticed that the other code looks like it would be faster and decided to convert my large org file into many small org files. There's not a built-in way to do this though, so I wrote the following function:

(defun my/prop-to-file ()
  "Take the Custom_ID property and make a new file with just the entry"
  (let* ((props (org-entry-properties (point) "Custom_ID"))
         (id (cdr (assoc "CUSTOM_ID" props)))
         (filename (concat "/path/to/new/notes/directory/" id ".org")))
    (org-copy-subtree)
    (with-temp-file filename (yank))))

This assumes that your org file has note entries setup where the Custom_ID property is the bib key. I then opened up my bib file, widened the org buffer, put point on the first heading, and recorded a keyboard macro that applies this function then moved to the next heading. Then I applied that macro to all the headings in the buffer (which took a while).

After changing the relevant variables (like bibtex-completion-notes-path), the first call to helm-bibtex is about 5 or 7 times faster. It's quite noticible.

Dropping this note here with the hope that maybe it helps someone else.

jabranham avatar Jan 27 '18 15:01 jabranham

Thank you very much for sharing the results of your investigation. The algorithm for finding notes in indeed rather naive. In my setup, initial parsing is reasonably fast (3s), and for that reason I never improved the code. I guess in the single-file case it would be better to first scan the notes file for entries and to build a look-up table that can later be used to annotate the entries. A variant of this approach may also speed up loading in the multi-file setup, not sure.

tmalsburg avatar Jan 27 '18 16:01 tmalsburg

Also, instead of searching the entire buffer (as it does now), maybe we can use org-element-parse-buffer to parse only the headings. This should make loading faster. Here's a test:

(defun keys ()
  (with-temp-buffer
    (insert-file-contents bibtex-completion-notes-path)
    (let ((tree (org-element-parse-buffer 'headline)))
      (org-element-map tree 'headline
	(lambda (id) (org-element-property :CUSTOM_ID id))))))

(helm :sources (helm-build-sync-source "Notes keys"
		 :candidates (lambda () (reverse (keys))))
      :buffer "*helm sync source*")

Then with a cached list of the results/keys, we can use the member function to check if we have ENTRY-KEY for notes.

jagrg avatar Feb 06 '18 22:02 jagrg

This is what I meant: https://github.com/jagrg/helm-bibtex/commit/bed65841034ef4bfaa6e01920f5b51eb5dc23a11

jagrg avatar Feb 06 '18 22:02 jagrg

Did you test how much faster this is compare to the current code?

One potential improvement: Wouldn't it be faster and conserve more memory resources to recycle the notes buffer if it already exists (as the existing code does)?

tmalsburg avatar Feb 16 '18 11:02 tmalsburg

On 16 Feb 2018, Titus von der Malsburg [email protected] wrote:

Did you test how much faster this is compare to the current code?

About 1.5 seconds faster using the elp library. That's 4 and 2.5 seconds, respectively.

One potential improvement: Wouldn't it be faster and conserve more memory resources to recycle the notes buffer if it already exists (as the existing code does)?

In terms of speed, I don't think there's much difference between (find-file-noselect bibtex-completion-notes-path) and (insert-file-contents bibtex-completion-notes-path) if that's what you mean.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*

-- Jonathan Monthly Atmospheric CO₂: January 2018: 407.98, January 2017: 406.13

jagrg avatar Feb 16 '18 17:02 jagrg

In terms of speed, I don't think there's much difference between (find-file-noselect bibtex-completion-notes-path) and (insert-file-contents bibtex-completion-notes-path) if that's what you mean.

No difference is expected when the notes buffer isn't already loaded, but if it is loaded, the new code loads the file into a temp buffer when it could just reuse the existing buffer. Perhaps the difference is too small to matter?

Could you please make a PR? Thank you.

tmalsburg avatar Feb 17 '18 10:02 tmalsburg

I think the difference is too small.

jagrg avatar Feb 18 '18 16:02 jagrg