d3-dag
d3-dag copied to clipboard
Moderate sized d3-dag layouts seem crowded and harder to follow in comparison to graphviz
Erik: I'm going to go into more detail on this in the "improve for interactivity" issue, but one of my anticipated workflows using d3-dag is starting with a large, complex layout and doing a combination of zooming in, editing, and manipulating it to make different subsets of the layout clearer. For this purpose, the larger layouts should be as clear as possible to begin with :-}. Possibly due to something I'm missing about configuration, the layout I'm currently using is hard to follow compared to what's a pretty close to equivalent graphviz layout. Naively, it looks like graphviz does a better job of keeping lines between nodes distant from each other, so it's easier to see which nodes are connected without carefully tracing each line (and guessing at directions in line clusters). Do you have a sense as to whether this has something to do with d3-dag's internal layout algorithms, or with a configuration parameter I'm not adjusting properly? I did try playing with nodesize (including the nodesize for the virtual nodes laid down for placing lines), but that doesn't seem to help.
My current working observable notebook (for the d3-dag layout) is https://observablehq.com/d/32c3b2a8612fda16. Zooming and panning works in the main display field, and will be necessary to look at the different portions of the layout. My comparison graphviz layout (which may be off by a node or two, but is pretty close to the same data) is attached.
I clicked your link, but it still took me to an old version. Here's a fork that is slightly edited, to tweak spacing, remove arrows and always use the right layout. When you set fraction of nodes to 60%, roughly the same size graph as in above, you get the following:
I don't think this is terrible, but it's not ideal, and definitely not as pretty as graph-viz. There are a few things I notice that could be improved:
- the default and generally most pleasing layering is
simplex
. However, this compresses the graph vertically, and that often causes more necessary crossings. This graphviz layout above has two layers with a single node that could compressed. Strangely in this instance, compressing them wouldn't hurt the number of crossings, so I'm not sure why the layout looks this way. Still, it's worth exploring if simplex can be either augmented to allow longer edges if it will help crossings, or alternatively looking at different layering strategies. There's probably some literature from graphviz about how they do it, so that's worth a look. - The graph viz layout has much fewer crossings. I tried to use the optimal decrossing algorithm, but even with this many nodes, the graph is too big. On thing I never implemented because it never seemed necessary, but it might help here and may be the first thing I try is greedy refinement for the twolayer approach. This basically looks to see if it can do any edge swaps to reduce edge crossings. This is slow, but should probably work, and be relatively easy to implement. There may also be other strategies mentioned from how graphviz works.
- Side note: some of these may take longer. One thing I've played around with, but haven't used too much is webassemply on these parts that should take longer. I think for the time being I'm against including wasm in this library due to the difficult of bundling, but if these new strategies work but are too slow, would you be willing to incorporate wasm as a way to substitute faster routines?
- The graphviz layout has better edge handling. The current edge layout only makes sure there's a gap on the center of a row, but curve interpolation can undo that for corners etc. There's also edges that curve up, which d3-dag won't do for various reasons. I don't love having to incorporate shape information into the layout, so I'm probably not going to touch aspects of this for a while, but I could imagine a fourth step routine, or separate function that moves edges out of shapes after the coordinate assignment. Doing this properly would probably mean modeling the edges as beziers and doing a constrained optimization. Potentially graphviz will have more to say about this, but this is the last thing I want to touch, Open to your thoughts here.
Summary
Thanks for providing a counter example. It's hard to improve things without a reference, so this should help. I'm probably going to start with 2, while I look into graphviz more. If you notice anything else here, that you think would be more important, chime in.
My apologies, I think I forgot to re-publish. You should be able to see the update now. But your plans seem good to me regardless.
I'm willing to look at WASM, but I share your preference against it, partially due to bundling issues, partially due to security issues (which may not exist--I haven't looked at the security model for it), and partially because it seems worthwhile understanding the underlying performance difference between Graphviz and d3-dag first before reaching for the big hammer. Maybe it's that one is written low-level and one is written in JS, but it seems worthwhile to rule out other possibilities before assuming that one. But I haven't looked at either program from a performance point of view, so I could be completely off base.
you added some nice zoom and sizing support. I'm curious to see how much better decrossing will help. I think it should help some, but part of me thinks that it'll need the extra vertical space from a different layering. It also raises the question about setting node size based off of the layout, but that seems like a later bridge.
I forget that I implemented two-layer opt which is always going to be better than greedy swapping. Adding that to the layout does help, but it's still a little more compact than necessary meaning I need to read up on layering. The edge rendering also hurts a bit, but clearly the layout just needs more layers to play with.
Looking at this again though, there's also some decrossing that twolayer will never achieve, but the current optimizer chokes on...
Just bumping with an update. I haven't had time to look at this, and was hitting the easier targets first. I think this still rests at I need to look more at graphviz / read more papers. I did implement greedy edge swapping, but as noted above, it will do no better than opt, and that still has some issues.
Good and bad news. After reading about how graphviz works, at least from a description standpoint, we're doing close to the same thing. Their layering is a bit different, but should mostly resolve the same way. As is their decrossing, although maybe I can take better advantage of some of their heuristics. The coordinate assignment is a larger difference, they picked a different objective function that should be reasonable to try. Finally, graphviz does fancy post processing for the line splines. I don't think the last part is super necessary, and I really want to keep splines to d3, so I'm going to focus first on the better coordinate assignment.
Also, to help with this general task, I'm going to pick a large enough graph to test on until I'm close to what graphviz is doing. If you have a candidate other than the one above, send it over.
Aside: ports : from looking into it, the way they do ports is as an extension to coordinate assignment. That makes coordinate assignment really complicated, so I'm going to hold off for now. Is there behavior you specifically want, because I feel like it should be possible to make the link intersect with where you want via something like the intersection function. Some of the coordinates will be a little off, but it shouldn't harm layout that poorly.
Candidate large graph: The above's probably a good place to start, as I'm fairly sure it could be laid out in a more understandable way. If you want a larger graph, we could use the master graph that I'm doing subsets of in my interactive explorer; it wouldn't take me very long to just create a d3-dag input for all of it. Having said that, I haven't even tried to display that in graphviz, and I suspect it's noticeably bigger than sense could easily be made of. But if you want a challenge ... :-}
WRT ports, it isn't high priority. But some of the "shortcuts" I've used with graphviz have involved a subset of lines coming into a node converging on a specific point on that node (basically, dividing incoming arcs into n subsets that converge on n different points). I could imagine doing something like that with the intersection library we were talking about, but the problem is that if one of the lines is coming from a substantially different place than the others, the edited line may reach the port "through" the node, which would make the association basically invisible or scribble on top of the node. I think this needs to be solved at the layering and dummy node stage to make sure all the lines are coming in from basically the same direction as the port exists on the node. Which I suspect is pretty difficult :-J.
(Let me know if you want me to spell out my example use case in more detail--it's not directly relevant to the interface or implementation, but it might provide context/motivation.)
WRT the graph, sounds good.
WRT ports, I think I understand, although it's not clear how to actually make the ports go the opposite way on a node. I'm not sure if or how graphviz handles it, but it likely involves the spline trickery they do, which I'm trying to avoid so we'll see. I like the idea of the intersection callback. There may be a way we can tweak the idea to make it easy to incroporate into coordinate assignment later on.
Also WRT to ports / intersection. I think your idea probably makes sense to just include in the library, so if you want to put together just an example of what you'd call and what it'd produce, I should be able to implement / discuss from there (maybe in its own issue for tracking)
WRT ports: Your comment made me curious, and I threw together the following graphviz output to see what graphviz would do with it:
digraph test { A; C; D; E;
A -> D:se;
A -> D:sw;
A -> E:se;
A -> E:sw;
C -> D:se;
C -> D:sw;
C -> E:se;
C -> E:sw;
}
Result attached:
Specifically looking at the C-> D:sw link, it looks like graphviz is doing something funky in the layout process, though maybe it's just laying down the splines specially after the layout process. The incoming angle at D:sw doesn't look like it leads to node center, though I could imagine different reasons for that.
WRT intersection, sounds good, I'll put together an example around the intersection idea and link to it from a new issue. Thanks!
Thanks for putting together an example. This is definitely only possible with graphviz's extra custom spline step. To get features like this working well, it's definitely going to require something to that effect, and the question is if I really want to support it. Part of the benefit of building with relation to d3 was to just use simple d3 spline code. It's still definitely worth having a place to discuss ports, but I feel like your intersection proposal will make it reasonably easy to support 90% of use cases and have the lines looking 90% reasonable. I also think a few tweaks could boost that number.
I'm very solidly in favor of simplicity, especially in the context of d3--having small, easily composable pieces strikes me as maximizing usability. So, yeah, let's not worry about ports for the moment. At some point, I may come back to the shortcuts in representation and see if I can come up with different shortcuts that make sense in the d3-dag (with intersection library) context.
Erik: I just ran into a graph which is pretty simple, but is, to my eye, has a pretty glaring layout problem, and I figured I'd share in case it's useful. I think all the information is in the picture, so I'm just attaching a screenshot, but it'd be pretty easy for me to create a notebook with this in it if that's more useful.
The placement of "Iron Plate" is the thing I'm looking at. I have the impression that d3-dag always puts nodes without parents in the first layer, but I've thought a couple of times that it might be useful to put them as close to their children's layers as possible, and this is one of those cases; there would be less need for decrossing if "Iron Plate" was in L5 rather than L1 (counting from the top). But even assuming L1 is best (and I could imagine that depends a lot on what kind of data is being laid out), why isn't it all the way over on the left? The location chosen seems almost worst case?
I'm using d3.dagStratify.decycle(true) ->sugiyama_config and otherwise taking the defaults. Glancing through the docs, I suspect I could get some improvement by using a bottom-up decrossing algorithm, but I didn't see at a quick glance the right way to customize for that to try it. Worst case, this can be grist for the mill as to what the best defaults should be :-}.
@curiouserrandy this is a bug with multidag support for sugiyama. I just found it when testing some improvements to large dags. I think the coordinate assigment that dot uses might help this significantly, but I first have to debug why iron plate is up that high instead of right above uranium fuel cell. If check 0.9.1 it work as expected.
Until I figure this out, I would say don't use the package I uploaded earlier.
To your second point, two-layer decrossing by default does one down then one up pass, so you're probably already running it.
I just released version 0.10.0
which changes the sugiyama default to use both twolayer greedy and the simplex coordinate assignment, which I think helps things a bit. I also fixed the bug you were running into above. Comparing larger graphs with dot, it:
- seems like the decrossing could still be a bit better. I think this is because dot tries a few greedy approaches, but then checks if they actually improve the overall number of of crossings. dot also has a different way to handle nodes with no parents in the two layer format, which is worth experimenting with. Both of these can be done without breaking backwards compatibility, so I went ahead and released a new version.
- Their custom spline rendering helps a bit, so it's likely worth exploring a bit more, but I'm still hesitant and think that decrossing can be better first.
There's still no support for ports, and now that I've reached this stage, the necessity of better spline interpolation is present. This current version has some better decrossing heuristics taken from the graphviz paper, and seems to generally do better / is probably at the point that I'm happy. Given that I changed the sugiyama defaults again, I'm probably going to have to bump a "major" version, but I might not as it should just improve layouts with a small enough time and memory penalty to not matter for most use cases.
I'm inclined to consider this done, as I'm not sure I want to add graphiviz style splines, and I'm not sure what else I could do to make it better. Curious about your thoughts though.
Oh, that's much better. Thank you. I look forward to playing with it.
On an abstract level, I'm torn about how much effort it's worth improving the layout algorithm. I think some (else I wouldn't have filed this issue) but there's a level on which there's simply a tradeoff between the interactivity required (either pair down the graph until the meaning jumps out at you, or (possibly in the future) incorporating input from the user as to where to place the nodes) and the cleverness of the layout algorithm, and if you're going to be interacting with the user anyway, that puts less of a premium on the absolute best algorithm.
I did recently have a graph where I did have the urge to "just tweak it a little" (i.e. incorporate user input in graph layout) but I could imagine that going away with the improved heuristics you've got above. Let me know when I can get that from the public repository; I haven't done the learning curve climbing to build locally yet.
I just released 0.11.1
today, which has everything in it with new defaults, e.g. sugiyama()(dag)
should just produce that. This is mostly but just copying all the details that graphviz spelled out.
On the larger note, I agree with your torn-ness about the layout and how much a "better" layout is worth.
On a personal level, I'm interesting in finding / tweaking louvain clustering, which is a hierarchical modularity based clustering and adding interactivity on top of it and d3-dag to create something that might work for large graphs, but that takes time and its not in the cards in the short term.
On the slightly shorter term, one of the graphviz papers talked about a version of sugiyama style layouts that work for nodes with arbitrary heights and won't produce unnecessary gaps. This aspect doesn't matter much for your purposes, but a side effect is the ability to pin nodes or edge control points in arbitrary positions and lay the rest of the graph our around it. The downside of this approach is it's more computationally expensive that the current version, but I'm looking to see how hard it is.
Hopefully this generally works better now with defaults that are closer to graph viz. I also added tweakShape
as an implementation of the port work around you suggested which works quite nicely.
My last comment was over a year ago, and I didn't intend this to take this long. If you still think there are things to be done to support moderate sized graphs, I'm willing to play around.
I honestly think a better alternative would be to use something like louvain clustering, and repeatedly split the cluster apart to get a good sense of the graph. I have no idea how hard this would actually be to implement, and part of it would work better with a more dynamic node size option for sugiyama which I've worked on but is much harder to support.
I'm going to close this for now, but for any of the above reasons, feel free to reopen.