zetteldeft icon indicating copy to clipboard operation
zetteldeft copied to clipboard

Feature Request: change to Xapian for faster search

Open anthrolisp opened this issue 6 years ago • 15 comments

See https://github.com/hasu/notdeft

anthrolisp avatar Mar 02 '19 02:03 anthrolisp

Impressive stuff! And since deft can become quite slow with a large number of files, it would be a welcome change.

However, this would (1) be quite a change and (2) make zetteldeft much more difficult for people to adopt (since installing and configuring notdeft doesn't seem that straightforward).

In any case, I'd like to see someone else try to implement the zettelkasten system in this package. It won't be me, or at least not in the near future.

EDIT: After skimming the documentation, I must say this looks like a really powerful approach. Maybe this could fit in some very long term plans...

EFLS avatar Mar 02 '19 08:03 EFLS

I'm sure you are already aware of these, but just thought I would hang them here as a matter of interest to other possibilities. Many thanks in advance!

  • http://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere/
  • http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/
  • https://oremacs.com/2015/07/27/counsel-recoll/
  • https://github.com/Wilfred/ag.el
  • https://github.com/syohex/emacs-helm-ag
  • https://github.com/abo-abo/swiper
  • https://github.com/stardiviner/org-seek.el
  • https://github.com/alraban/org-recoll
  • https://github.com/dfeich/helm-deft
  • https://www.reddit.com/r/emacs/comments/5rsxua/how_to_do_structured_search_on_thousands_of/

anthrolisp avatar Mar 07 '19 18:03 anthrolisp

It seem as though you could run notdeft alongside zetteldeft, and use it as a search engine for the zettelkasten without having to modify anything at all.

anthrolisp avatar Mar 08 '19 22:03 anthrolisp

Simply running it alongside deft would indeed be no issue, I think. Just like you can search and move through files in any way you like -- after all, zetteldeft is nothing more than plain text files in a single directory. However, the core functions of zetteldeft will still be using deft in the background, so on that front nothing really changes.

EFLS avatar Mar 09 '19 09:03 EFLS

FYI, installing notdeft is very straightforward, so long as you're not running Ubuntu!

anthrolisp avatar Mar 23 '19 17:03 anthrolisp

@EFLS Can you comment on the performance issue for a large number of files ? Can you give an estimate on the slowness and a threshold for the number of files ? I am quite hyped by the tutorial but would like to know how usable it is :)

Edit I've only found this issue on deft's repo talking about it.

apraga avatar Jun 11 '20 14:06 apraga

I have only a little less than 500 notes in my Zetteldeft, but haven't noticed any slowness. What do you consider a large number?

There is an important thing to keep in mind here: the issues with regards to slowness are related to (1) Deft starting up or refreshing its database (and caching titles and content), and (2) incremental search.

Generally speaking, (1) only occurs once after starting up Emacs. And (2) can be circumvented by disabling interactive search (as suggested by jrblevins in the link you suggested).

Zetteldeft only relies on programmatic calls on deft-filter, so I don't expect too much issues with large sets of notes.

EFLS avatar Jun 11 '20 14:06 EFLS

I've been testing Deft with a set of 10000 Markdown files containing over 8 million words (available here) and startup time does indeed increase considerably. The non-interactive search, however, remains relatively responsive. So I don't think there is a very big issue at the moment.

EFLS avatar Jun 11 '20 14:06 EFLS

Thank you for the quick and detailed answer !

apraga avatar Jun 11 '20 15:06 apraga

The main Zettelkasten I use has about 4,000 files. It is slow at startup in the standard config. I have found that setting the deft-file-limit, though, works great. It only limits the number of files being displayed at a time, and greatly improves response times. This is what is in my use-package deft :config:

(setq deft-file-limit 200)

I don't have a reason to look at more than a screenful of candidates, the way I use it.

gklimowicz avatar Jun 11 '20 15:06 gklimowicz

Hey, I'm no elisp programmer but I've made a set of functions as a zettelkasten on top of notdeft which deals with the 10000 markdown files with no problem. It's kind of like what EFLS did with deft but its way less polished. But it does have some extra window following features for viewing files in search.

It's not very well documented, I'm hoping to improve that soon but in case anyone is interested. https://github.com/jd-m/notzettel.

jd-m avatar Jul 05 '20 19:07 jd-m

Oh wow, that's awesome. I might have to take a look for inspiration for additional Zetteldeft features!

EFLS avatar Jul 08 '20 08:07 EFLS

Please do! It still needs a lot of work but for me it is usable.

jd-m avatar Jul 08 '20 17:07 jd-m

https://github.com/jd-m/notzettel

Am I the only one to think perhaps it should have been called notzetteldeft? :smile:

TRSx80 avatar Aug 09 '20 22:08 TRSx80

Thanks to all the people who shared their actual (vanilla) Deft performance (and especially, mitigation options), I no longer think this is so much of an issue as I did when I was first researching.

In addition to all the alternatives anthrolisp mentioned in this post (up thread), I had made a fairly in-depth study of related software (including performance, and other issues) that people might also want to take a look at here.

But the TL;DR at the end of it all is that, for now, I decided just start using regular Zetteldeft (on top of regular, vanilla Deft). I am very confident, however, because I learned there are a number of additional tools available to grow into should I ever feel the need. For example, depending on what I found lacking, I would probably progress to incorporating one or more of the following tools, any of which can easily be added alongside / in concert with my current setup:

  • Recoll if I felt the need for more performant full text search (and want some additional Nice Things(TM) outside the context of Zettel/Deft). With your favorite completing-read type framework (Counsel/Ivy or Helm) you could come up with something very much like the Deft UI I think (and probably eventually I will start working on something like this at some point in the future).
  • org-rifle if I want to do full text search, within the context of Org headings.
  • org-ql if I am doing Org based Zettel (as I am) and want to implement something to query certain PROPERTIES, etc., perhaps even using (optional) SQL-like language.

All of the above mentioned packages have the advantage (unlike notdeft) of being readily available as packages, either on (M)ELPA in the case of Emacs stuff, or distribution package managers in the case of Recoll. No need of compiling or any other dickering around. They also are more generally useful outside of strictly Zetteldeft context.

In fact I am pretty sure I am going to start using all three of them, sooner or later, whether with Zettel/Deft or not!

Cheers! :beers:

TRSx80 avatar Aug 25 '20 18:08 TRSx80