obsidian-dataview icon indicating copy to clipboard operation
obsidian-dataview copied to clipboard

Parser Performance: Use Obsidian's cache instead of parsing the markdown manually

Open Arcitec opened this issue 3 years ago • 8 comments
trafficstars

This is just to track and discuss a great idea which was hidden in a user comment in the following post:

https://github.com/blacksmithgu/obsidian-dataview/issues/811#issuecomment-1024369351

In short:

  • Obsidian already has a very detailed cache describing the markdown contents and structure of all documents.
  • There are still some things that Dataview needs to parse manually (such as its inline Key:: value support), but most of the data that Dataview needs is already cached by Obsidian.
  • The Obsidian cache contains values such as all frontmatter variables, all chunks, lists, status of tasks, all section headers, all tags, etc.
  • It may therefore be possible to use that cache instead, to massively speed up Dataview's scanning. Especially on mobile, where it has been reported that Dataview runs very slowly https://github.com/blacksmithgu/obsidian-dataview/issues/602
  • If Obsidian's cache is enabled, there may have to be a Dataview toggle which lets the user know that "using the Obsidian cache disables Key:: value support", since that data doesn't exist in Obsidian's own cache. But pretty much all other values Dataview wants exists in their pre-built cache.

Arcitec avatar Jun 28 '22 01:06 Arcitec

Dataview does use the Obsidian cache, and uses it for fetching frontmatter, sections, and tasks+list elements. The "slow parsing" parts are to extract out inline fields (as you mention), as well as obtain task text and other task metadata since the Obsidian cache doesn't have anything other than line number & status.

It would be possible to have a fast-path option which skips all manual markdown parsing, which would probably be a pretty good speedup on mobile for first-time indexing, but it would also constrain you to using Dataview as essentially a frontmatter index with tag+folder support. I suppose that may still be pretty useful.

blacksmithgu avatar Jun 28 '22 02:06 blacksmithgu

Dataview does use the Obsidian cache, and uses it for fetching frontmatter, sections, and tasks+list elements.

Ahh, excellent! :)

The "slow parsing" parts are to extract out inline fields (as you mention), as well as obtain task text and other task metadata since the Obsidian cache doesn't have anything other than line number & status.

Does Dataview cache/index the parsed data, or does it re-calculate them on every Dataview query?

It would be possible to have a fast-path option which skips all manual markdown parsing, which would probably be a pretty good speedup on mobile for first-time indexing, but it would also constrain you to using Dataview as essentially a frontmatter index with tag+folder support. I suppose that may still be pretty useful.

It would be useful for some cases, but don't sweat hard about it since I suppose most people want to be able to view matching tasks too.

There may be other ways to speed up mobile, such as a WASM module for parsing.

I have another idea, but it may be too limiting or silly:

  • Having a "light" mode where it relies only on Obsidian's own index,
  • Except for documents that have a frontmatter value such as dataview-parse: true, meaning that people can enable full Dataview parsing for the documents where they want access to things like task-list details.

But that may just be a silly idea and may end up not being used by many people, since it would be a bit clunky.

 

Edit: This gives me another idea:

  • How about a "light" mode which only parses the lines that Obsidian has marked as "contains tasks". Ignoring inline Key:: value anywhere else in the document. In fact, probably ignoring inline values even in the tasks, just to speed up parsing. Basically just getting the task text and their tags.
  • So Dataview would just have to open the document, split it by lines, and extract the exact line-range of the tasks, parse that, and exit. Instead of having to scan the entire document.

This may just be a silly idea too, but perhaps it would be useful. Personally, I don't need inline-values at all, and only care about Dataview quickly finding tasks and their tags.

Arcitec avatar Jun 28 '22 02:06 Arcitec

Does Dataview cache/index the parsed data, or does it re-calculate them on every Dataview query?

Dataview caches parsed data and only regenerates it when the page itself changes; it saves this metadata to a browser cache across Obsidian restarts as well to speed up startup time. Generally mobile performance is ok after the really long initialization time, but I can understand it being annoying.

There may be other ways to speed up mobile, such as a WASM module for parsing.

I'd love to use WASM but WASM support for the languages I would use (Rust, Kotlin, etc) is still pretty experimental and interacts wierdly with worker threads. Writing parsers in C/C++ is an exercise in misery :)

Having a "light" mode where it relies only on Obsidian's own index, except for dataview-parse: true.

I think your second recommendation of just skipping everything except task text in "light mode" would probably be a good middle ground, since that could just be implemented as "enable/disable inline fields".

blacksmithgu avatar Jun 28 '22 03:06 blacksmithgu

Dataview caches parsed data and only regenerates it when the page itself changes

Fantastic. So then the bottleneck on mobile is the indexing itself.

I think your second recommendation of just skipping everything except task text in "light mode" would probably be a good middle ground, since that could just be implemented as "enable/disable inline fields".

Good idea! Labeling it as "Disable Inline Fields" would be a great way to speed up the parsing while also making more sense for users than calling it "light mode". The description underneath that setting could just briefly mention the performance benefit too, and say something about it helping for mobile.

Hopefully it's not even that difficult to implement, since it's basically the exact same parsing as now, but narrowed to a smaller line-range, and ignoring inline fields.

The real question becomes: What's the actual bottleneck on mobile? Is it opening the file? Or is it parsing the file? If the slowdown is caused by mobile filesystems being slow at opening files, then this performance tweak would be moot. :) So before investing time into it, perhaps try a mobile build which just strips out most of the parsing to see if that actually helps.

Arcitec avatar Jun 28 '22 03:06 Arcitec

It's a good question - I'm a chronic laptop user and so I don't use Obsidian Mobile enough to be able to even guess at performance bottlenecks. I guess it's time to actually install my own plugin on a phone and see if I can attach chrome debugging tools to it.

blacksmithgu avatar Jun 28 '22 03:06 blacksmithgu

@blacksmithgu It might be possible to attach a bluetooth keyboard and press Ctrl-Shift-I if you have access to such a keyboard.

Alternatively, enable Developer mode on your phone (go into Android Settings, then About, then click on the build-number 7 times, if I remember right). Then go into the phone's Developer Settings and enable USB Debugging/ADB. Attach a USB cable, and when the phone asks what you want to do, say "Transfer files" (this enables the debug connection).

Finally, run this application: https://github.com/Genymobile/scrcpy, which will display the android device on your laptop's screen and allows you to write commands to it via your regular laptop keyboard. It's a very convenient way to control the device in general. There are hotkeys for things like "back" and "home", I think it's Alt-B and Alt-H, but can't remember.

Hopefully one of those methods lets you open the Chrome developer tools inside Obsidian on Android. I just realized that I assumed that you use Android. If you use iOS, they have some other "screen mirror/remote control" feature though, so it's probably doable there too.

Arcitec avatar Jun 28 '22 04:06 Arcitec

Found this detailed guide for debugging Obsidian Mobile:

https://keathmilligan.net/obsidian-plugin-cross-platform-testing#mobile-testing

Arcitec avatar Jun 28 '22 04:06 Arcitec

I also wrote some guide for debugging, which has some ideas not mentioned in the article mentioned above

https://mnaoumov.wordpress.com/2022/05/10/how-to-debug-obsidian-plugins/

mnaoumov avatar Jul 12 '22 17:07 mnaoumov