helm-bibtex icon indicating copy to clipboard operation
helm-bibtex copied to clipboard

optimize bibtex-completion-prepare-entry

Open mpedramfar opened this issue 1 year ago • 6 comments

If there are a lot of folders to search through, especially if some of them are on the cloud, bibtex-completion-find-pdf can be slow. Currently, bibtex-completion-prepare-entry searches for the pdf files for all the entries. If bibtex-completion-pdf-symbol is the empty string, then it will still search for every single pdf file and then show nothing for it.

This pull request checks if bibtex-completion-pdf-symbol is the empty string, and if it is, it will skip searching for it. The same is done for notes and bibtex-completion-notes-symbol.

mpedramfar avatar Dec 18 '22 01:12 mpedramfar

Hi! Just to make sure there are no misunderstandings: I think you don't want to show PDF symbols in the results list but still be able to open PDFs? If correct, I see how this PR fixes your performance issue, but it seems like an extension that is very specific to your personal setup. And I don't understand why finding PDFs is so slow in your setup. Even with many folders it should be fast. My suspicion is that the best solution for this issue may be something else. Question: What do you mean when you say that some folders are in the cloud? Are these Dropbox folders or something like that? These should be cached locally and fast.

tmalsburg avatar Dec 18 '22 10:12 tmalsburg

Yes. I don't want to lose any functionality and I don't want any changes in the behavior (even when bibtex-completion-pdf-symbol is the empty string) except for speed.

It probably is a bit specific to my setup. However, if you use rclone to mount a cloud storage (Dropbox, Google Drive, One Drive, etc) or use sshfs to mount a remote folder, then even running ls to list the contents of a remote directory could be slow.

mpedramfar avatar Dec 18 '22 15:12 mpedramfar

This pull request also resolve #350 and provides a workaround for #226

mpedramfar avatar Dec 18 '22 23:12 mpedramfar

Not sure but I think a better solution might be to read the list of available PDFs once and then to check that list, not each file separately on disk. In general, we need a better approach to detecting PDFs and I have an idea for a plug-in approach that users could flexibly configure to their specific needs. Such a system would close half of all issues for this project or so. Perhaps I will find some time to work on this over the holidays but I can't promise it.

tmalsburg avatar Dec 19 '22 10:12 tmalsburg

How many folders to you have? Asking because it's useful for me to know what setups users have.

tmalsburg avatar Dec 19 '22 11:12 tmalsburg

I have less than 500 folders, but some of them are in the cloud (mounted network files, not cached) and I've also made some tweaks to other places in bibtex-completion so that my pdf files are not necessarily of the format citekey.*.pdf Because of this, the search becomes very slow. So, in a way, this PR helps with 2 issues for me:

  1. slow access to network mounted drive
  2. support for more sophisticated pdf file names (which inevitably slows down the search, no matter how it's implemented)

On the other hand, I'm not entirely sure how a plug-in approach could be different enough to not have the first issue. As I understand, if the new system is at least as capable as the current one, then this PR can help.

By capable, I mean this: Are we caching a list of pdf files, or are we detecting them? If we are caching the list, then we need to watch for any change in the filesystem. Except for local directories, this can be a bit of a hit and miss (e.g. see https://github.com/libfuse/libfuse/wiki/Fsnotify-and-FUSE and https://github.com/fsnotify/fsnotify#why-dont-notifications-work-with-nfs-smb-fuse-proc-or-sys). So, unless we wait until inotify support has been added for various FUSE and network filesystems, this is a loss in capability. One way to "watch" for changes is to periodically detect them, which brings us back to the issue of detecting. If we're going to detect them, then the cost of accessing the filessystem is always there, especially in network mounted folders. And this is exactly where this PR helps.

On the other hand, I can't think of how this change could be bad. Do you have a specific situation/plug-in system in mind where this change can be undesirable?

mpedramfar avatar Dec 19 '22 15:12 mpedramfar