markdown-links
markdown-links copied to clipboard
Performance on large directories is bad
I have a project directory that includes a node_modules
directory with a couple thousand markdown files. While I believe that a few thousand markdown files should be a doable task, in the short term, is there perhaps a way I could tell markdown-links to ignore some folders? If I move this folder out of my directory tree and restart, it's able to handle the remaining 400 or so files admirably.
In fact, the whole Foam stack, including this repo and the vscode-markdown-notes extension, can then successfully handle my ~450,000-file Dropbox (with only those 400 markdown files (although, for some reason, only 285 files are reported in the corner of the graph view), which Zettlr can't yet do. (Although, some tasks like finding backlinks, are a bit slow.)
Ignoring files is in progress. @ingalless started a PR which should significantly help with it. Other than that, we are planning to respect .gitignore
file which should also handle node_modules
in all or close to all cases.
Regarding files that are not reported – they might have been not discovered which most likely means they are missing a Markdown H1 title. It was an early assumption that is not true anymore for many projects and we are gathering feedback on improving it. See #28.
Thanks for reporting the problem!
I have the same issue - I've loaded about 1200 files (which will be expanded to at least 3000 soon) and it takes the graph minutes to load up, and is extremely laggy when zooming and panning.
Is it possible, at the very least, to save it all so that start-up is quick? I assume the slowness in general usage means that an entirely different way of generating it is needed. For example, Obsidian uses D3 and Pixi.js and the same files perform flawlessly in their graph.
Do you know if they are using d3
just for data manipulation or d3-force
for simulation too?
Here we currently use d3-force
displayed and manipulated as an SVG. Moving it to WebGL would definitely help with performance. I have some experience with this exact kind of thing so I will try playing with it.
Is it possible, at the very least, to save it all so that start-up is quick?
You mean save the previous graph data so it might start from the cache and just do the simulation step? Actually, I have never measured performance so the first thing would be to check what is really causing the slowness here.
I don't know what they do exactly, but am pretty sure they use d3-force. Seems like WebGL or Canvas would be a good solution to the data scale issue, so long as they don't come with other big tradeoffs.
You mean save the previous graph data so it might start from the cache and just do the simulation step? Actually, I have never measured performance so the first thing would be to check what is really causing the slowness here.
Yes, I figured it would be worthwhile to cache the data rather than rebuild. I have no idea about any of this, but your idea to check where the bottleneck actually is sounds like a good plan!
Here's the dataset that I was using: https://github.com/nickmilo/IMF-v3 (its a great method/template to look through and learn from anyway!)
@nixsee @tchayen Worth mentioning--performance is not the only reason that you might want to cache node locations. It's also valuable to encourage some consistency in their location from run to run, just to improve usability.
Just leaving another option I've been mulling over:
What about a config option for large datasets that enables the graph parsing to only scan links for the currently open file, perhaps only 1-2 nodes out? You would lose being able to see the whole graph, and perhaps caching is a better option, but I thought I'd drop this here
@ingalless How does this differ from @ianjsikes 's Focus graph? Ultimately I prefer that view, but full graph mode is also useful at times.
Aside from probably needing to change a lot of code, what would be the drawback to using WebGL/canvas for better performance?
@nixsee part of the performance issue seems to be not only rendering the graph, which changing the rendering engine would solve, but also parsing all the markdown files in a project to gather all the edges and nodes
@ianjsikes branch, to my knowledge, doesn't actually change how markdown links parses the files. It "hides" nodes that aren't relevant, but do correct me if that understanding is wrong! The proposed solution was the ability to selectively switch to an "on demand" mode as it were, where processing would only happen on node change, and then only for 1-2 edges out.
A combined solution would attack both performance bottlenecks, although switching to WebGL would allow those who are fine waiting for the nodes all to load to have a smooth experience
I've not checked, but could you even batch load nodes so that the user can see something whilst distant nodes are loading?
This is an alternative or even combinative solution to the issue, as "fix performance" is very general.
@ingalless I don't have any insight into how any of this works, so I assume you are correct - I was merely asking! I'll leave it to all you folks to figure out what is the best way forward. But it would seem to me that, in addition to an adjustable-distance Focus mode, being able to efficiently load and smoothly work with the full graph would be a desirable goal. Obsidian uses d3js and PixiJS for their rendering, and it works pretty quickly and smoothly with a couple thousand nodes.
I can confirm: 1200 .md files need around 20 seconds to render as a graph and already painful to interact with. There are no other files in the directory. All ideas I have was already mentioned above: caching and smart scan from the current node to neighbors (if it is possible to re-render nodes in real-time, of course).