obsidian.el icon indicating copy to clipboard operation
obsidian.el copied to clipboard

Slow performance when opening files or invoking functions

Open nadavspi opened this issue 2 years ago • 17 comments

Love the idea of this package!

I have about 5000 files and I'm seeing a delay of a few seconds whenever I open a file in my Obsidian path, or callobsidian-jump, obsidian-backlink-jump, etc. I'm using a pretty much out of box Doom config. Files within my obsidian directory open quickly when obsidian.el isn't configured.

Is this expected? Anything I can do to help debug?

nadavspi avatar Dec 05 '22 18:12 nadavspi

We need to add caching, it would fix the lag. I was expecting somebody with a large vault to comment. On 5 Dec 2022 at 18:10 +0000, Nadav Spiegelman @.***>, wrote:

Love the idea of this package! I have about 5000 files and I'm seeing a delay of a few seconds whenever I open a file in my Obsidian path, or callobsidian-jump, obsidian-backlink-jump, etc. I'm using a pretty much out of box Doom config. Files within my obsidian directory open quickly when obsidian.el isn't configured. Is this expected? Anything I can do to help debug? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

licht1stein avatar Dec 05 '22 18:12 licht1stein

I figured as much after looking through the code 😀

nadavspi avatar Dec 05 '22 18:12 nadavspi

I would be very grateful for a PR :) On 5 Dec 2022 at 18:52 +0000, Nadav Spiegelman @.***>, wrote:

I figured as much after looking through the code 😀 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

licht1stein avatar Dec 05 '22 18:12 licht1stein

I'm not sure my lisp chops will allow for that, but I may try.

nadavspi avatar Dec 05 '22 19:12 nadavspi

I'll be glad to help. Don't have time to do it quickly myself, but I'll be glad to comment on your efforts. On 5 Dec 2022 at 19:20 +0000, Nadav Spiegelman @.***>, wrote:

I'm not sure my lisp chops will allow for that, but I may try. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

licht1stein avatar Dec 05 '22 19:12 licht1stein

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

DrBanana419 avatar Jan 30 '24 17:01 DrBanana419

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

yes

licht1stein avatar Jan 30 '24 17:01 licht1stein

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

In my own branch I've commented out the (obsidian-update) call inside the obsidian-jump function. This greatly increases how quickly I can jump between notes, but obviously sacrifices some of the guarantees of updated tags and file cache.

jayemar avatar Jan 30 '24 18:01 jayemar

im also experiencing slowness

In my own branch I've commented out the (obsidian-update) call inside the obsidian-jump function. This greatly increases how quickly I can jump between notes, but obviously sacrifices some of the guarantees of updated tags and file cache.

@licht1stein would be nice to have an option to disable this in the config

smallstepman avatar Feb 05 '24 20:02 smallstepman

Sure, I will look into it when I have time, if somebody else doesn't implement it

licht1stein avatar Feb 11 '24 19:02 licht1stein

I did a bunch of tests today. For my use case, I managed to make things 1000 times faster.

I've made a very very low quality commit of the things I've tried here.

Here's a summary:

  • avoid building big intermediary lists (e.g. pass the predicates directly to directory-files-recursively)
  • try to make obsidian-file-p as fast as possible
    • run faster, simpler predicates earlier and the heaviest later
    • check if it ends in .md early
    • check if path is inside .git or node_modules early
    • avoid obsidian-descendant-of-p
  • listing files using projectile-dir-files-alien instead of directory-files-recursively (This wouldn't be generally applicable, as it assumes that projectile is installed and that we're currently in a project.)
  • reduce the number of with-temp-buffer, this one I discovered using the profiler

Here's some utilities that I used over and over to help be optimize stuff:

(defmacro fsta/comment (&rest body)
  "Comment BODY."
  nil)

(defmacro fsta/measure-time (&rest body)
  "Measure the time it takes to evaluate BODY."
  `(let ((time (current-time)))  ;; TODO Time should be gensym'd
     (prog1
         (progn ,@body)
       (message "%.06fs" (float-time (time-since time))))))

(defun fsta/profile (fn)
  (unwind-protect
      (progn
        (profiler-start 'cpu+mem)
        (funcall fn)
        (profiler-report))
    (profiler-stop)))

Here are the functions that takes the most time:

  • obsidian-file-p: check if a path is a markdown file, it's not slow per-se, but it is called repeatedly
  • obsidian-reset-cache: it lists all the files)
  • obsidian-update-tags-list: it reads all files and looks for tags
  • obsidian--update-all-from-front-matter: it reads all the files and parse their "front matter"

Here are some other ideas that I didn't have time to test:

  • use a hash-table instead of a list for caching the list of files
  • use one timestamp per file (easy to do once you use a hash-table)
    • if the entry is expired, check the file's modification time to decide whether to re-read it or not
  • use emacs' file watching facilities to update the cache as soon as a file is changed
  • use hooks (like after-save-hook) to update a file's entry in the cache as soon as it's saved
  • in obsidian-file-p: if a file is already in the cache, then it means it already passed the predicate, we don't need to check every conditions again
  • extracting tags and extracting the front matter should be done at the same time, currently (AFAIR) every files are read twice when calling (obsidian-update)

(I might make a PR, if I have time, no promise :sweat_smile:)

fstamour avatar Feb 13 '24 03:02 fstamour

Wow, this is great work! I would be very grateful for the PR, with exception of the projectile-related part. I also think that hash-tables would be a better way to handle things.

licht1stein avatar Feb 13 '24 11:02 licht1stein

Notes for myself:

  • there's 2 ci jobs: one for testing and one for publishing
  • the test jobs calls make test
  • make test calls eldev -C --unstable -T test
    • -T is short for --time
    • -C is short for --color
    • --unstable tells eldev to use melpa-unstable
  • https://github.com/emacs-eldev/eldev
  • eldev is configured with the Eldev file
  • the tests use the (BDD) test framework buttercup
  • there's already a dummy vault in tests/test-vault/

fstamour avatar Feb 26 '24 13:02 fstamour

Notes for myself:

  • there's 2 ci jobs: one for testing and one for publishing

  • the test jobs calls make test

  • make test calls eldev -C --unstable -T test

    • -T is short for --time
    • -C is short for --color
    • --unstable tells eldev to use melpa-unstable
  • https://github.com/emacs-eldev/eldev

  • eldev is configured with the Eldev file

  • the tests use the (BDD) test framework buttercup

  • there's already a dummy vault in tests/test-vault/

That would actually be a great addition to readme

licht1stein avatar Feb 26 '24 14:02 licht1stein

It would have been funny if it was already there, because I didn't even look at the readme :laughing:

fstamour avatar Feb 26 '24 14:02 fstamour

I find the lag on (obsidian-jump) particularly annoying.

Here is my approach (which org-roam also uses) to soften this a bit, so at least jumping to existing notes is quick.

(defvar dima-obsidian-vault-path "~/Documents/Obsidian Vault/"
  "The directory to the Obsidian Vault.")

(defun dima-obsidian-update-after-save-hook ()
  (when (s-starts-with-p default-directory (f-expand dima-obsidian-vault-path))
    (message "Updating Obsidian Vault...")
    (obsidian-update)))

(add-hook 'after-save-hook #'dima-obsidian-update-after-save-hook)

Then create a custom (obsidian-jump) function like this.

Mine also invokes (obsidian-capture) when no match is found, which also mimics how org-roam does its find note.

(defun dima-obsidian-jump ()
    "Jump to Obsidian note.

Patches:
- remove slow call to `obsidian-update'
- when no target is there, create a new note"
    (interactive)
    (let ((gc-cons-threshold most-positive-fixnum))
      (let* ((files (obsidian-list-all-files))
             (dict (make-hash-table :test 'equal))
             (_ (-map (lambda (f) (puthash (file-relative-name f obsidian-directory) f dict)) files))
             (choices (-sort #'string< (-distinct (-concat (obsidian--all-aliases) (hash-table-keys dict)))))
             (choice (completing-read "Jump to: " choices))
             (target (obsidian--get-alias choice (gethash choice dict))))
        (if target
            (find-file target)
          ;; inline `obsidian-capture' without title variable
          (let* ((filename
                  (s-concat obsidian-directory "/" obsidian-inbox-directory "/" choice ".md"))
                 (clean-filename (s-replace "//" "/" filename)))
            (find-file (expand-file-name clean-filename) t)
            (save-buffer)
            (add-to-list 'obsidian-files-cache clean-filename))))))

Still, (obsidian-update) seems annoyingly slow to me with just 1500 Markdown files, so I will check the notes of @fstamour.

Dima-369 avatar May 19 '24 14:05 Dima-369

I've put together a prototype branch to experiment with trying to improve performance using some of the ideas from this thread. The main change is that the files cache is now a hashtable instead of a list, with the values being a metadata hash table holding the tags, aliases, and links.

{<filepath> : {'tags: (list of tags associated with file)
               'aliases: (list of aliases associated with file)
               'links: {<linked-file-name: (response from markdown-link-at-pos)}}}

This simple change ended up leading to many other changes as it could then made sense to use the information in the cache for many other functions. If you do a diff you'll see that I've changed a lot. It's definitely more than is necessary to change as much of this was just me renaming things, removing things, and moving things around to help me better understand how things works. This is definitely not a PR, I just made it a WIP PR to make it easier to view and see the diffs to help get feedback.

This hashtable cache is populated once, and then an 'after-save-hook' is added to repopulate things for the saved file to keep things up to date. I think this should work well for files modified within obisidan.el, but it won't pick up changes to files modified by other processes. The idea I implemented is to have a timer running that periodically checks the list of cached files against the list of files on disk, and then removes the files that no longer exists and adds any new ones. This should work for files added/deleted by other process, but not files modified by other processes. So I don't love this solution.

I did look at emacs's file-notify-add-watch, and I think it could work. But it doesn't recursively watch directories, so a new watcher would need to be created for each directory, and I wasn't sure if having a watcher per directory in a directory-heavy vault would issues. Probably something to test.

I've added a few more tests to help make sure I wasn't totally breaking things, but I'd definitely like to have some more that test more of the tags and alias functionality.

One things I'd like to explore more is whether it makes more sense to have a single data structure as the source of data for all functions, or whether parallel function-specific data structures should be maintained that are kept in sync with the files cache.

For example, we currently use a separate aliases hashtable that is used for alias-related functions. I did implement the necessary changes so that this hashtable is updated when the files cache is updated. But this does add additional complexity when it may be find to only generate the aliases hashtable when it's needed.

Similarly, we could have a tags data structure and a backlinks data structure. That would make retrieving this information, but it would certainly complicate the code with more happening on every update. A tags hashtable is used when jumping to a tag, and this is generated at the time of the call to find a tag. It's a simpler implementation at the cost of more computation happening at the time of a function call.

jayemar avatar Jul 19 '24 19:07 jayemar

I've been using that branch referenced above for the past few months and it's been working really well, so I cleaned up the code a bit and submitted it as PR #100. There are a lot of changes and I've been the only one banging on it, so I'd appreciate any feedback other may have.

It's very responsive and has even allowed me to create a backlinks panel to immediately show all of the backlinks when I open a new note file, although I didn't include that in the PR as I was trying to keep it as minimal as possible.

jayemar avatar Oct 07 '24 01:10 jayemar