d3-dag icon indicating copy to clipboard operation
d3-dag copied to clipboard

Improve d3-dag for interactive use

Open curiouserrandy opened this issue 3 years ago • 28 comments

(I think of this as an umbrella issue to discuss what modifications to d3-dag might be useful; from which other issues can be spun off as we decide what's actually worth doing.)

Erik, for context, I have the following two goals:

  • [Primary] See if adding interactivity can make useful visualization as a way to understand very complex entity relationship diagrams. An example of this would be visualizing the type relationship diagram or callgraph of a large computer program. For any reasonably large program, the full visualization would qualify as abstract art :-}, but exploring it interactively a bit at a time might still provide useful understanding to someone new to the program.
  • [Secondary] Allow interactive creation of easily visualized programming constructs, for example complex Excel formulas or SQL selection statements.
    I'm going to focus on the exploratory visualization for the moment.

I imagine two different possible workflows in approaching getting information from such a dataset: 1) Creating "some" visualization of the whole thing that provides enough information to allow making reasonable choices about zooming into a small enough section which can be more effectively laid out (example: Dependencies between compilation units, or directories of compilation units), or 2) Starting small, with a single entity or small set of entities that's textually defined, and expanding out from those entities to create the subset of the larger graph that has most interest to the explorer at that moment in time. I consider (1) noticeably harder (because the nature of the visualization almost certainly has to change at different scales) so I figure I'll start with (2). But for either pathway, the more complex a visualization that can be understand without interactivity the better, which is where issue 78 came from.

Ideas that occur to me that might make the above easier (though obviously there's plenty of work I can do without any changes in the basic library):

  • Separating the node placement and line drawing stages of layout, so that someone could reposition a particular node in a place that seems better to them and have the lines update around them without changing the other node positions.
  • Potentially having a way to "privilege" a node or set of nodes to make its relationships clearer at the cost of obscuring relationships between other sets of nodes that aren't the primary focus of attention.

(It's not a library change, but I also want to make zooming keep the text and node size the same but scale out the lines, which I think would make zooming and panning more useful just by themselves.)

I do definitely see the value of the potential improvements you suggested in https://github.com/erikbrinkman/d3-dag/issues/68#issuecomment-1009199581, I'm just waiting to jump in and ask for them until I actually run into problems that they'll solve.

What are your thoughts as to the best directions to explore in this area, either in or outside of the context of my long-term goals?

curiouserrandy avatar Jan 21 '22 20:01 curiouserrandy

Thanks putting this together! This is a great place to start, and I think would ultimately be a very useful aspect of d3-dag. I see it as a testament if we could create an observable that allowed you to input text or a link to json (inputs are tricky, don't want to focus here) and explore the dag generated via potentially these two methods. This kind of thing can get out of scope quickly, so we can definitely start with basics.

For your two goals, I think if we can nail the primary exploratory, anything you'd want to do with visual creation should be feasible. I personally want to stay away because drag and drop / ui stuff can get tricky to get right, and it's not the point of this library, but if we get a satisfactory type of exploration without the features to do creation, it's something to look into.

For your two broad ideas, I agree that 1 is more challenging, and in general, you're asking how to make a display to understand complex data, of which the answer is, there's no good answer. That's why data visualization is a field. Even with graphs it's difficult. My general feeling with understanding large graphs is you never want to display the whole thing. At some point when a graph or a dag gets too big, you probably want to cluster it, and display the clusters (or some other way to compress). If those are semantic and user defined, then great. If not, there are plenty of algorithms depending on what you're concerned with. This is not to say I don't want d3-dag to be incapable of laying out large dags. In #78 the dags are small enough that they can be represented, but d3-dag does a less than stellar job on them, so room for improvement. What I more see as the problem here is where the interactivity lies. Arguably there's interactivity here with clustering and then zooming in, however this sounds very difficult. I think it'd be cool if d3-dag came with tools to do this, and I still think it'd be good to have, but for the sake in interactivity, lets start with 2. I'm sure there's some cool work related to this that I'd like to look at before I tried to do too much from the top of my head.

As for two, this is something I think d3-dag should reasonably be able to support, so lets dive in to what we would ultimately want. For the purposes of this issue then, lets focus on progressively displaying (or hiding) a dag one node (or neighborhood) at a time, or modifying the layout of nodes that are already displayed. Lets also focus only on sugiyama style layouts. This is mainly for simplicity, and it more represents the style of layout we care about.

Exploration

In terms of actively exploring a graph by starting from a subset of nodes, what's your thought for interaction? Say we start with a single node by id. I'm imagining a set of activated nodes, these are the nodes we care about. We then have the set of neighbor nodes, these are all nodes that are parents of siblings of an activated node. I'm imaging the layout displays all neighbors and activated nodes, and edges in between them. Clicking on an activated node deactivates it. Clicking on a neighbor node, activates it. Is this roughly what you were thinking of or something else?

Node Placement

Separating the node placement and line drawing stages of layout, so that someone could reposition a particular node in a place that seems better to them and have the lines update around them without changing the other node positions.

It's important here to understand how the sugiyama layout works:

  1. It assigns every node to a layer.
  2. It splits the edges up into dummy nodes. Dummy nodes always have one parent and one child. After splitting this, each edge always goes from one layer to the layer below. This allows us to position edges so they curve around nodes.
  3. We swap nodes (and dummy nodes) on each layer to minimize crossings. This is optional, but layouts look bad without it.
  4. We assign float coordinates to nodes (including dummy nodes) to make the layout pleasing.
  5. We remove dummy nodes, storing their positions as edge data.

It would take a rewrite / a different main function, but we could keep the nodes in the layered layout with dummy nodes. You should then be able to move nodes up and down a layer, or left and right in a layer. I'm not going to write drag and drop stuff, but it wouldn't be too hard to add buttons to adjust this once the backend supported it. Is this around what you're thinking here? Note, while nodes can always move left or right, edge constraints would prevent moving up or down in some cases, without moving a lot of other nodes too, although potentially this could be modified.

Privileged Nodes

Potentially having a way to "privilege" a node or set of nodes to make its relationships clearer at the cost of obscuring relationships between other sets of nodes that aren't the primary focus of attention.

Given the way that sugiyama works above, do you have an idea of how this would work? I guess I understand the idea of marking a node as privileged. What I'm not clear on is how this should change the layout. Almost every aspect can already be tweaked by a function of the node, so it would be possible to add a flag to the node as privileged and change various aspects, but I'm just not sure what you want to change.

Zooming

(It's not a library change, but I also want to make zooming keep the text and node size the same but scale out the lines, which I think would make zooming and panning more useful just by themselves.)

This is mostly UI nonsense that you can probably find examples to do on observable (or if not on observable, then in general js), but it not really really something I'm interested in exploring.

Summary

There's a lot here that I'm interested in exploring / working on, but it does generally end at UI. I'm not super interested in working on or build drag and drop, or optimizing any of this with respect to one interaction, but I am interested in making sure the library efficient supports (as possible) whatever would be required from those interfaces.

I left a lot of questions for how you see this working. If you could respond, and maybe prioritize what you think would be most interesting, we can start there.

erikbrinkman avatar Jan 22 '22 00:01 erikbrinkman

Your orientation makes a lot of sense to me. Specifically, I wasn't actually hoping to get you involved in this UI side; I've always figured that any work in that area would be mine (it's why I'm working in d3). Your willingness to work with me on improvements to the d3-dag library is already something I'm very grateful for; "Patches welcome!" is the response I expect when asking for improvements in open source libraries :-}. I just gave you the UI background so you have the context to critique my requests. And I completely agree that the right initial use case to target is starting small and adding nodes, presumably based on a large internal dataset (which the d3-dag library wouldn't see all at once).

Exploring

My somewhat unformed thoughts on exploring graphs starting small were very close to yours. I wasn't defining neighbor nodes as parents of siblings; in my mind they were just any node (parent or child) that had a link to a primary node. But yes, click to activate a neighbor node or deactivate a primary node, and that would result in an animated re-layout. I don't think this has direct library implications; the library would only know about the primary and neighbor nodes (or potentially just the primary nodes if the UI items indicating the presence of neighbors were small active regions on the primary nodes, though that model invites drastic layout changes upon primary node set change rather than just allowing them :-}).

Node Placement

Drat, what you say makes sense and does make this harder. A possible alternative: I know that graphviz allows specification of locations for some-but-not-all nodes in a layout, but I don't know how they do it; possibly we could add that feature to d3-dag? Then if the user moves a specific node to a location, that node (and possibly others previously moved) gets a fixed location/switched to having it's location controlled by the user, and a full re-layout of all the other nodes is done. I have no clue how this would work in terms of layering assignment (maybe worthwhile investigating how graphviz does it) and it would be unfortunate if just fixing a location without changing it drastically changed the rest of the layout. It's a tricky issue. My mind starts wondering about other layout algorithms than Sugiyama, but I suspect that's something else that's worth thinking about only if we run out of options working with Sugiyama.

I will say that I am resistant to making "layering" a concept that the user moving nodes around needs to think about much; from a UI perspective I'd like this to be about nodes with directed links between them, and the layout as being about making those node relationships as clear as possible visually; the fact that Sugiyama does layering->dummy node creation->decross->coord assignment I'd like to keep as an implementation detail.

Privileging Nodes

If we do add the "fix node location" option to d3-dag, I could imagine this being done in two passes. The "privileged" part of the graph would be laid out by itself, its locations fixed, and then the rest of the graph could be laid out. This could be done without any library modifications other than "fix node location", which would be a win.

I'll note that the privileging idea isn't high priority for me; it's just a random thought I had. Node placement is more important, as I think it's close to the first thing a user of the exploration program will want to do: "Well, this graph is ok, but it'd be a lot clearer if I just moved this node over here". And it's not the best design to make the first thing the user wants to do when they first open your program impossible :-}.

Summary

So my current thoughts are that there are no blockages to my working on the ground-up exploration interaction with the library as it currently is. That'll have the advantage of letting us find out empirically if the animated re-layout is too distracting or confusing and if we need library support for incremental changes. I'll work on that. In terms of library support I'd like to have in the near future, I suspect being able to somehow merge user preferences and layout choices with regard to node placement (to reach for as abstract a way to describe what I want as possible) is highest priority, even though I completely hear you about it not being obvious how to do it. But I think the ball is primarily in my court; I'll play with the ground up exploration and let you know what I come up with.

curiouserrandy avatar Jan 23 '22 15:01 curiouserrandy

Actually, one other thought, at the architectural/philosophical level. When I first started looking into doing exploratory graph understanding, I tried using D3's force-directed layout. That library is simple, easy to understand, very customizable, and does absolutely nothing for avoiding edge crossings. (Secondarily, I also missed the conceptual organization that Sugiyama tended to add to my graphs because of the layering step.). But d3-dag is much more of a monolith library than the force directed layout. That's conceptually not as good a fit for d3 (because d3 tends to be oriented towards combining lots of little pieces), and practically means that I can't use, e.g., the decrossing support of Sugiyama with force direction for the final layout.

Would it make sense to consider allowing use of the individual pieces of d3-dag separately? It would be a partial solution to the node placement problem above (layering + dummy node + decrossing could be done once, and force direction could be interactively with dragging-and-dropping) and might allow for incremental exploration of non-Sugiyama layouts.

Just a random thought--I think my priorities are still the same as at the end of the previous comment.

curiouserrandy avatar Jan 23 '22 16:01 curiouserrandy

This makes a lot of sense, and thanks for the feedback. Conceptually, I had the idea of making the layering part of the on the fly layout. One line of thinking, is even in the progressive node adding scenario, we might not have to reoptimize things like layers or number of crossings. If we add one node at a time, then maybe the computation can be cheaper as we amortize it over user interactions.

That being said, that is complicated, and I agree conceptually that exposing the ways sugiyama works so closely to the incremental layout might be bad. People playing with a graph don't want to be constrained by the layers, even if the layout uses them.

I'm interested to see how a demo progressive expanding of a dag works. I like observable, but am pretty web literate, so let me know when you have something working anywhere to investigate.

As for a steps in the right direction, I see two immediate things for me:

  1. Do what you said in the second statements. The sugiyama function mostly just strings together all of the steps, but some of the steps are only exposed internally. I should pull those functions out so that enterprising people can have an easier job as customizing the steps.
  2. If we ignore the potential cost of re-laying out the dag every time, it should be possible to alter each step to preserve node positions, at least roughly. The currently layout for the demo large dag renders quickly enough with simplex layering and quad coordinate assignment. The only hard part if the decrossing. If we take in node positions, and use those to fix relative layerings, that should kind of guarantee the y positions. Decrossing we just need to add a constraint if two positioned nodes are in the same layer, and quad is a quadratic optimization, so we can just fix the final x coordinates. The problem here comes in with how we manage node heights and freely positioned y coordinates. If every positioned node gets they're own layer, that's fine, but if they're close in y coordinate, the layering won't make that much sense. Alternately we could snap them to the same layer if they have similar y's, but that will either make the edge crossings a bit weird, or for a good layout will require snapping the y to layered position. Do you have thoughts on these alternatives? 1. should be easy, and I can try to get a working version of this soon after. It should also allow the positioning of dummy nodes (e.g. edges), but I'm not sure what the api would look like.

erikbrinkman avatar Jan 24 '22 06:01 erikbrinkman

I've been thinking about your question #2 for the past couple of days without coming to any blinding insights. In lieu of any of those :-}, here's the general path of my thoughts--there may be some utility to them.

  • From a UI point of view, I could imagine the user wanting two different things when they try to drag a node to a new position: a) The rest of the graph being re-laid out around the node, or b) all the other nodes staying where they are and only the dragged node and edges (and thus for Sugiyama, the dummy nodes) moving/being re-laid out.

  • When I think about how I'd ideally want (b) to work, what keeps coming up is the equivalent of fixing the locations of all of the actual nodes, and using something force directed on the dummy nodes; I could imagine that'd give the kind of feedback the user is expecting while (hopefully) keeping the decrossing and basic layering from the initial Sugiyama layout. I think that's likely to be as close to "good" as we're going to get for that use case, so I'm not sure if it's worthwhile trying to alter the Sugiyama layout to approach it, especially if you're going to be separating out the pieces anyway.

  • With regard to (a), I'll first note in passing that my image of how this would ideally work is a complete re-layout, and my image of how that would work if only one node's position is fixed/controlled would be a squishy, slow kind of whole-graph drag, and if two node's positions are fixed/controlled would be a combination of zoom and rotation. (It's not that I think the user is likely to want either of those behaviors, but I don't think other behaviors would fit well with the conceptual "re-layout the graph as you move the node".). So I'm imagining a situation where they user has moved & fixed the location of two nodes and is moving a third.

  • Mapping that into the Sugiyama algorithm, I find myself wondering if re-doing the layering and dummy node assignment makes sense at all. Conceptually, those are derived properties of the underlying graph relationships and have nothing to do with the locations of the node in the plane. And the user is already taking some control/responsibility for the layout being pleasing by controlling the location. So maybe just redo the last two steps interactively (choosing coordinates and removing dummy nodes)? I could imagine coordinate assignment algorithms that, yes, pay attention to layering, but also pay attention to fixed locations of nodes. Which, yes, starts to sound similar to what option (b) above, only for the entire graph that the user hasn't fixed :-}. It does make me more interested in what the coordinate assignment algorithms look like than I was before I got to this bullet point.

I can imagine what the UI for positioning dummy nodes might look like (drawing programs I've seen have "handle points" along curved lines to allow people to play with how the curves go), agreed, I'm not sure what the API would look like. In terms of the algorithm, it's parallel to fixing non-dummy nodes, so that might suggest shifting the API for expressing points that a line would be using towards the one for lists of nodes, only associated with an edge, so whatever API we use for fixing nodes could also be used for fixing dummy nodes. Of course, that's tricky, because you don't want the user to be able to create dummy nodes, since they're a function of the layering and the start and end node layers. Hopefully this will become clearer as we make more progress on the node location specification.

curiouserrandy avatar Jan 25 '22 18:01 curiouserrandy

My initial demo for interactive graph exploration is @ https://observablehq.com/@curiouserrandy/initial-interactive-graph-explorer. Pan&zoom still work, and clicking on nodes should switch them between active and inactive. All nodes connected to active nodes are displayed, but the only links displayed are to active nodes. Sorry this took so long; there were a couple of learning curves I had to climb. Fun climbs, though :-}.

I basically like it, but it's got a lot of work and polish still to go. Thoughts off the top of my head:

  • It's still very easy to get confused when there are many nodes displayed, easier than it should be IMO. To take a look at this, make "Iron Plate" active and click around a bit. I'm hoping your exploration of issue #78 bears fruit here. I may also change how "ghost nodes" (nodes that are connected to active nodes but aren't active themselves) are shown; if there are more than three of those attached to any given active node, I may have an active surface on the active node signaling that there are undisplayed ghost nodes. I consider UI issues my problem, but if you have any thoughts, I'd be interested.
  • There are improvements to make in how the graph adjusts on layout. My current algorithm is "Don't change zoom, but pan so center of layout is in center of viewport". Alternatives I want to play with are "fixed zoom, keep clicked node centered (if it exists in new layout)", and "pan to match centers and change zoom so that the new layout fits in the viewport". Having said that, I don't, at least at the moment, feel like the re-layout process is getting in the way, and even if it does, I could imagine visual signaling to make that easier (e.g. color nodes adjacent to the one clicked during the animation only to make it easier for the user to track the section of the graph they're paying attention to).
  • I showed this to my partner, and while they liked it their response was that the first thing that'd want for such a tool is seeing the entire graph at once, even if it was so large that it was impossible to interpret. They said that they thought that'd still give them a fair amount of information, including both whether the graph had disjoint subgraphs and a visual indicator of just how complex it was. This makes sense to me, and as mentioned elsewhere I want to make sure to satisfy initial urges I expect people to have in using this tool. Which probably means thinking about dirt simple layout algorithms for extremely large numbers of nodes, and changing those algorithms as one zooms in. I find myself wondering how high force direction can scale, and also about graph analysis algorithms that can find "clumps" in graphs and group them for a higher level layout.
  • I want to tighten the ellipses around the text (and probably find a way to put arrows back in).

Let me know your status when you have a chance.

curiouserrandy avatar Feb 04 '22 18:02 curiouserrandy

Hey @curiouserrandy . Sorry this has taken so long. I've been busy with other things, but I haven't lost interests. Here are my responses to all of this.

From your first comments:

  • I can see (b), and you potentially could render the dummy nodes as interaction points. That might make it easier for someone to just lay it out as they see fit from an original layout. The point of dummy nodes is just to prevent lines intersecting with nodes they're not a part of, so adjusting them in a force directed sense could lead to other issues. Still either should work, and shouldn't require any other work on this library, as it exposes all of that for you.
  • For (a) maybe context about sugiyama is helpful. The layering isn't just a function of the graph, it's also the y coordinate. The function of the graph is only in the sense that edges should point "down". The dummy nodes are just faux nodes that allow us to avoid collision with the actual nodes, so I still think I side with recomputing them two in the notion that three nodes are fixed, but maybe I'm still missing something.
    • The point of removing dummy nodes is just to give a "nice representation" if we're only redoing the last steps, it might make more sense to just keep them as "sugi nodes" and just map dummy nodes to edges. However I'm still not convinced that re-doing the full layout isn't ideal.
    • As far as how the coordinate assignments work, they use the fact that nodes are already ordered on a layer and have a size to pick an optimal position. The quad operator which seems to work for most sizes does this by solving a constrained quadradic program that tries to optimize things like "lines should be vertical" and "lines should be straight".

In general, I'm curious about your thoughts on this. I've made a few changes to support some misc requests, and I'd like to release, but I still want to see if it makes sense to introduce more breaking changes first.

For your second part, I want to start by saying this is really cool, and works better than I imagined. There's some jitter when a node drastically changes the layout, but I don't necessarily think its bad. It does highlight #78 is a larger issue. And from playing around I'm partially convinced that the layering is part of it. Nodes are so close that the graph grows really wide. Since I don't see anything else pressing, this is what I'm reading up on now.

  • I haven't found the ghost nodes to be an issue, even when there are a lot. I do think that if the layout is better for large graphs this won't be much of an issue, so I'm not sure I would focus on it, but maybe I'm missing something. One thing you could do it put a step in between by putting a single ghost node with the text "expand children" or something, which I think is similar to the UI clue you were talking about.
  • I mentioned it before, but I agree, I didn't think the re-layout was too jarring, and it seems cheap enough to just call the layout function each time, so it doesn't seem like right now anything is necessary to customize interactivity from this perspective, although if there was some aspect you think could be changed to make this easier, I'm all ears.
  • It all comes back to making this work for large graphs :P. I think that the types of layouts here can scale if using the cheap versions, and in the layout section you could pick the method based off of the size, there are just aspects that need to work better on large graphs.
    • If you did use something like the incremental sugiyama, then in a browser you could use the greedy the method in a subprocess to slowly remove crossings if the user wasn't interacting with it. I'm not sure this makes a lot of sense now, but I just added the method, so if you're curious I can give you some pointers.
    • More generally if you really want to layout a large graph, I think my go to would be something like clustering. There are algorithms like that can produce a dendrogram, essentially allowing you to get successively smaller clusters. A modern-esqe algorithm is louvain clustering, but I didn't see any js implementations. You could make a layout that starts at the whole graph with the fewest number of clusters. Edges between clusters could have edge thickness representing the number of edges that were subsumed. If you click a cluster, it could expand to 1 level more granular clusters, and the graph could be re-laidout that way. That might give more of a high level summary. I think given the way your current demo works, you should be able to do something similar with d3dag as it is.
  • For this last part, one thing you can do is actually render the text on the svg. You can then use getBBox() to get the text size to make the nodes tight. If you do that beforehand, you can also use that to indicate node size for the algorithm.

Summary: This prototype is cool! I think the next thing I have to do is figure out how to get larger graphs to work well. Let me know if I missed something obvious.

erikbrinkman avatar Feb 26 '22 23:02 erikbrinkman

No worries, @erikbrinkman! Life happens. I remain grateful that you consider this work worth investing your time in.

Preserving node positions

WRT (b), yes, I think it makes sense to wait until I've had a chance to do some playing around with different UIs and what the library currently provides before we worry about more work in this space. I'm not sure I'll get to that soon--there are several things that are higher on my priority list. But I do want to work on it.

WRT (a): Yes, I spoke a bit too broadly. I know that the layering derived from the basic Sugiyama layout is used to assign the Y position. But you could imagine a derived algorithm to support manual adjustment where that was only done for the initial Y position, and after that the user had the responsibility of not messing the graph up too much as they moved things around. This allows the algorithm not to worry about what I think of as the back-mapping problem (from Y coordinate to layer) if the user specifies a y coordinate that either doesn't clearly map to a layer or maps to a layer that doesn't make any sense from a reasonable DAG node -> layer assignment algorithm. And I don't think it would be too large a burden to place on the user, though it would require them to manage the dummy node positions too. Though this is conceptually moving back towards (b).

I certainly agree that it's less burden on the user if the rest of the graph can be nicely laid out in response to their movements. Maybe it's worth figuring out how graphviz does this? (I hesitate to say that, since I found the graphviz source not well commented/documented when I looked at it, but you may find it more transparent.)

Interactive demo

I'm glad you like it!

  • Ghost nodes. I hear you, but it's still bothering me. This may have to do with how meaningful the semantics of the underlying graph are to the user. There are specific relationships I'm trying to highlight, and the ghost nodes really get in the way of my ability to see those relationships. But I'm not too worried about it--this is just about exploring UI patterns and figuring out which ones work well. I'm certain I can find something that works well in this space, and I can't imagine it'll be dependent on any changes in the underlying layout engine.

  • Agreed, at least right now, re-layout isn't costly or jarring, so I'll keep doing that. I think we'll need to revisit this as we attack the "Layout the whole thing and focus into a part of it" use case but for right now, this isn't a problem.

  • Which, yeah, large graphs :-}. I had the same thought about using the cheap layouts at the large sizes, and clustering. When you say I should be able to do something similar with d3dag as it is, what is your image? That I would run a clustering algorithm on the graph before giving it to d3-dag, or that there was some way to use d3-dag to do the clustering?

New issue: Arrows/annotations

I also wanted to give you a heads up that one thing that I want to work on is a more general way of handling/managing intersection points between node graphics and edges. I don't think this belongs in the d3-dag library, but I could imagine it being generally useful enough for a contrib/ directory, so I wanted to run it by you. The design I'm currently thinking about is:

  • Create a function that post-processes the layout. That function goes through the graph and:
    • For each node, check if that node has an "intersection" function defined. If it does:
      • For each edge going into that node, call the "intersection" function on the line between the node center and the last-but-one control point on the edge.
      • Replace the last control point on the edge (presumably the node center) with the result.

This means that graphics functions (that want to draw arrows or label arrowhead/arrowtails) can get the place where the edge intersects the node representation. It allows arrows to be drawn in whatever way the user wants (normal to the surface, pointing at the center, along the line, etc.). It means the point at which the arrow is drawn is exactly the intersection point (as opposed to being slightly off because splines).

I wanted to expand this to allow supporting of the graphviz "port" functionality, but I couldn't see a way to do that--I suspect that graphviz supports that functionality by drawing the nodes before doing the layout, figuring out where the ports are, and fixing their relationship to one another in the input to the layout, which has much more integration between graphics and layout than I think we want. But if you think of some way to tweak this to support ports, I'd be interested.

Priorities

I'm inclined to agree with you that the major next thing I could use from d3-dag is support for large graphs, though I'm not certain--as you say, it may turn out that the best way to handle such things is by clustering or by using a simpler algorithm, depending on the graph size, and that both of those can/should be done by the calling code. I'm also quite interested in whatever you find in your exploration of issue #78; given that I think that's more tractable and more certain to be of use, it may be worth starting there rather than on the abstract "large graph" problem, at least until I have a chance to try the out of library techniques we've talked about. I'd be inclined to put solving the "integration user node placement into layout" problem third, though I'm happy to keep noodling on it--I believe that eventually that will be useful/important, but it's not actually a hard practical problem at the moment, at least for me.

My current priorities are:

  • Playing with the UI to make the current demo as useful as possible for its purpose (which, yes, laying out production lines in a computer game, but it's still a useful example problem).
  • As noted above, I'd like to come up with a semi-general way to handle arrows and line annotations.
  • After that, I plan to start playing with the "Big picture -> focus in" workflow.

Thanks very much for all your thoughts and questions!

curiouserrandy avatar Feb 27 '22 22:02 curiouserrandy

Preserving node positions

The more we talk about this, it seems like the idea of computing a layout based on fixed positions is really not the first goal, so it seems like maybe for now this is limited to dragging nodes around afterwards, which is out of scope. If you're interested in in recomputing parts like just running the coordinate assignment (which only looks at x coordinates) would be possible, although coordinate assignment is predicated on layers and dummy nodes, so ymmv.

Interactive demo

  • ghost nodes : the more I think about it, the more I see the idea of a two click implementation for ghost nodes, one to show all children and one to pick the one to fully expand.
  • unfortunately clustering is not built into d3-dag, and again I think it's probably outside of the scope of the library, but something that should be easy to do, and I could see making it in / tailoring to it if it's valuable. My first thought for how it could work is you run the clustering and just show a single graph that is all the largest clusters connected to each other (you might need some special constraints to make sure the dag nature of clusters is preserved. Then, clicking on a cluster "explodes" it to show all components nodes (or smaller clusters). There's a request that I haven't touched about grouping nodes which would allow you to explode a super cluster but still see the cluster boundary. In my reading about about graphviz I may find how they worked this aspect into, so it's possible it might finally get done.

Arrows/annotations

I like this idea. It could totally be done independently, but it also wouldn't be hard to as a final step to sugiyama. One of the benefits is that it can take both the line and the node as input, making it very easy to implement ports in that function. So I agree that it's not necessary to be included but this is elegant enough that it could also make sense. It also does fox the spline issue by attaching the point to the exterior. Angle may still be a bit off, but that's less of an issue.

Did you have thoughts on annotations? You mentioned it in the header, but didn't say anything here.

Priorities

  • Cool, agreed, and yeah, my goal is still to work on #78. I'm not planning on going through their source code, but instead plan to read this and this
  • Yeah, I like your suggestion, I think it makes a lot of sense, and will look into it after large dags look more like dots :P
  • Cool. Interesting if more things will come from that exploration.

erikbrinkman avatar Feb 28 '22 03:02 erikbrinkman

Arrows/annotations

Sorry, I should have mentioned, but mostly it's that if the location of the intersection point is known, the display code can put text at that location. For material flow diagrams (like the one I'm playing with now) that's usually a number that represents the amount of material input to the node or output from the previous node.

I could imagine the user wanting to make sure that the annotation doesn't overlap the line, but the nice thing about d3 lines is that, IIUC, they're basically functions from [0,1] -> (x,y), so if you get line(0.99) and line(1.00) you'll have a delta that tells you the angle of the line so that you can avoid it. This can also be used for getting the angle of the arrow correct.

I'll let you know when I have an example of this and we can decide if we want to bundle it with d3-dag.

Priorities

One thing I forgot to ask: You were going to separate out the different stages of the Sugiyama layout, so they could be done independently. Is that still on your roadmap? I don't need it immediately (I'm basically unblocked for the next 2-3 things I want to do, which is wonderful) but I could definitely imagine that being useful down the road.

curiouserrandy avatar Mar 02 '22 03:03 curiouserrandy

  • lines: Ah, I hadn't thought to use the d3.curve api that way, but it makes sense, and seems like it'd give you a close approximation.
  • broken out: I already did this! However, because I don't want to pollute the namespace they're not exported at the root level (in the bundle), you have to import from the "library itself". After thinking about this more, that may fail because importing from the library could end up searching for dependencies that aren't there. I'll have to think more about how I want to expose all of this, but you can view the results here: https://github.com/erikbrinkman/d3-dag/blob/b06e1880e0d6abd012d6cc5e723a3907bf0961cf/src/sugiyama/index.ts#L348-L388 That's the full layout algorithm, so you should be able to call each piece independently if you can import them (which you shouldn't be able to do just yet unfortunately). Does this look like what you expect? It's an important reminder to export these in the bundle and think more about external dependencies with the esm exported library

erikbrinkman avatar Mar 02 '22 03:03 erikbrinkman

Through more digging I've also found this which seems have just compiled dot in wasm and then uses that from js. The API still seems to require a dot format string - so it seems a little awkward, but if you weren't aware I wanted to call it out.

erikbrinkman avatar Mar 03 '22 03:03 erikbrinkman

Breaking out the different pieces of the Sugiyama layering: It's hard for me to tell what I think just based on that source file--the major question in my mind is what the API representation is after each of the steps and how easy it would be to do "something else" with it other than feed it into the next step. So for instance the "// update original dag with values" comment is concerning to me as it suggests there's state that isn't in the dag between steps and hence would be hard to access. An example (probably the first things I'd want to try with this, when I'm through the higher priority things on my plate :-}) would be to go through layer assignment, decrossing, and coordinate selection (for both dummy and regular nodes) and then dump it into a d3.force() layout and see what happens; that might be a useful use case to have in your mind as you think about the step separation. (That example does make me wonder where the dummy node assignment and removal occurs, since the example would require stopping before removal to get the value of force repulsion between dummy and real nodes.)

So maybe the thing to do is to solve the export problem and then I can give you opinions on the data structure? (Since it probably doesn't make sense for you to put a lot of effort into documentation while the API is in flux.)

dot in wasm: Yeah, gonna stick with my prejudice against wasm for the moment :-}. And the dot language seems clunky as a programmatic interface. But thank you for calling that out; it might be something to look at in the future.

curiouserrandy avatar Mar 03 '22 20:03 curiouserrandy

I do need to figure out how I want to export them, and release it, or at least export them in a version I hand to you temporarily, but I'm going to go ahead and do what you said not to do, because I feel like it. Maybe I'll put the documentation back into source, or decided to document it somewhere else.

 function sugiyama(dag: Dag<N, L>): SugiyamaInfo { 
   // compute layers
   // this assigns the value property of each dag node to its non-negative integer layer,
   // e.g. you could do something like `for (const node of dag) console.log(node.value)`
   // to print out all of the layers
   options.layering(dag); 
  
   // create layers
   // this is the most "doing something step". It wraps the dag in a new dag.
   // In this new dag, there are nodes that correspond to nodes in the original dag
   // which have data that look like `{ layer: <node.value from first step>, node: node }`.
   // It also contains "dummy" nodes for rendering long edges. Those have data that look like:
   // `{ layer: <layer of the dummy node>, link: <link from the original dag that this dummy node is on> }`.
  // These wrapper dag nodes I'll call "sugi nodes". The output of `sugify` is an array of arrays
   // of sugi nodes where all nodes with that layer are in that layer of the first array,
   // e.g. `layers[0]` is all sugi nodes with `{ layer: 0, ... }`
   const layers = sugify(dag); 
  
   // cache and split node sizes
   // this splits the node size call into two different functions, one that returns the x-size
   // and one that returns the y-size. It also caches the values so that we don't call the
   // function more than once.
   const [xSize, ySize] = cachedNodeSize(options.sugiNodeSize); 
  
   // assign y
   // takes the layer values and assigns the y coordinates
   let height = verticalCoord(layers, ySize); 
   if (height <= 0) { 
     throw new Error( 
       "at least one node must have positive height, but total height was zero" 
     ); 
   } 
  
   // minimize edge crossings
   // shuffles the nodes within each layer to minimize edge crossings,
   // modifies the arrays in place
   options.decross(layers); 
  
   // assign coordinates
   // based off of the node size an the positions of every node within the arrays,
   // assigns their x coordinates
   let width = options.coord(layers, xSize); 
  
   // verify
   // verifies that the coordinate assignments are valid
   verifyCoordAssignment(layers, width); 
  
   // scale x
   // if a new size was specified, simply rescales everything
   if (options.size !== null) { 
     const [newWidth, newHeight] = options.size; 
     scaleLayers(layers, newWidth / width, newHeight / height); 
     width = newWidth; 
     height = newHeight; 
   } 
  
   // Update original dag with values
   // takes the coordinates of the sugi nodes and if they point to regular nodes,
   // simply copies the x and y values. If they correspond to dummy nodes, it aggregates
   // all points along a chain into an array, and overrides the links points array
   unsugify(layers); 
  
   // layout info 
   return { width, height }; 
 } 

Some comments on the things you called out as a result:

  • if you wanted to apply a force directed layout to dummy nodes, you could stop after almost any step and use the layers variable. After calling sugify there will be no particular order. after calling decross, if you give them initial positions according to their indices you'll get a loose approximation. If you call it after coord, you could be able to layout the nodes as they would be in the final graph, but each of the dummy nodes would have their own point too.
  • unsugify simply copies over the data, so before calling unsugify (or after calling it once) you could change the x and y coordinates of any of the sugi nodes and then call it again.
  • hopefully this makes it a little more clear how you could modify this with drag and drop. You could move sugi nodes between layers and update their dummy nodes, or potentially warm start the process from the past version.

hopefully that's somewhat clear. I am still figuring out how I want to export them, and I'll update here when I have.

erikbrinkman avatar Mar 04 '22 03:03 erikbrinkman

So let me give you my thoughts as they come up reading through your documentation:

  • Layering: Looks good, though probably the name of the member should be something like "layer" instead of "value".

  • Node size: I don't think this would be visible in an exported breakdown, so looks fine.

  • Height: I presume this modifies the dag in place, so .y will have the assigned value? If so, sounds good.

    I will say that it's a bit strange from a naive POV to have a y coord assigned without an X coordinate (though it makes sense in the context of the Sugiyama algorithm), so I find myself wondering how the later stages depend on this stage? If there isn't a dependency, postponing this until you assign the X coord would probably reduce consumer confusion. But it isn't worth any heroic efforts for what I think would be a minor improvement; just documenting clearly that this only assigns the Y coord would be fine

  • Decross: I think what's happening here is that the ordering in the layers[n] array is being permuted; is that accurate? If so, worth specifying explicitly. Beyond that, looks good.

  • Coordinate assignment: Looks good.

  • Scale: Looks good.

  • Unsugify: Hmmm. In the ideal world, we'd have minimal differences between the output format and the intermediate formats. Would it be worthwhile thinking about expanding the dag data structure so that it could optionally include layers[] arrays and dummy nodes? I'd be tempted to include dummy nodes by default and have an easy documented way to turn that into a spline on the fly, since I can definitely see potentially wanting to interact with those nodes.

curiouserrandy avatar Mar 05 '22 21:03 curiouserrandy

  • layering: I agree, but for d3-dag layer is a generic settable numeric value, and I opted to remove specifying one in the type just for sugiyama layering.
  • node size: I think I'll export the caching function, but yeah, you just have to split up whether you're talking about x or y size if doing this manually, but I guess you make a good point that I don't need to export the caching function.
  • height: yes verticalCoord assigns to the y value. The reason it's separate is that x coordinate assignment is customizable, but y coordinate assignment is defined by the layer, so there's no reason to require all implementations of coord to run this same function.
  • decross: yes, you hit it exactly. I think that's specified in the DecrossOperator interface, but I'll specify here too.
  • unsugify: I'm not sure what you're saying here. In the design of this I tried to separate dag layout from the dag data structure as much as possible. The only way they relate is that the type of the data structure include some assignable variables for the layout and that's it. The dag structure is based off of d3-hierarchy so the idea of a global layers state that's somehow attached to the dag seems inelegant, where as right now each dag node functions as it's own minidag. Also dummy nodes are only inherent to sugiyama not any specific layout. IMHO the idea of creating a new dag data structure for sugiyama that does have dummy nodes is an elegant way to model this, without polluting the dag structure with ideas inherent to sugiyama. I'm not sure what you mean by turn dummy nodes into a spline on the fly, but if you look at desugify, it's pretty easy to take a link to a dummy node and turn it into a set of control points that you could spline.

erikbrinkman avatar Mar 05 '22 22:03 erikbrinkman

I just released 0.10.0, which also exported the three key components: verticalCoord as sugiCoordVertical, as well as sugify and unsugify. I'm open to more discussion around optimal interface, but releasing these fully should help you to explore.

In looking into dot I did stumble across dynadag. Here's a paper about the general idea. Figured you might be interested in taking a look.

erikbrinkman avatar Mar 06 '22 22:03 erikbrinkman

Thank you! That paper is a fascinating pointer for me--it basically nails what I've been interested in exploring from a research perspective. Scooped, 14 years ago! :-}.

The major question, obviously, is why hasn't anything been done with it since then (code last modified 2007, ten citations of the paper with no real extensions or recent activity). I'm going to have to chew on that one--I'm of the opinion that there are a lot of useful applications for this basic technology that just need to be proved out. But if someone's done the basic work already, it's worth revisiting that belief.

Thank you for the release; I will try to do some exploration. A warning that that may be a bit in the future; I want to do some cleanup work with what I currently have, and as noted above, re-evaluate my goals in the light of the new information. But I will try to try out the new interface.

WRT the unsugify interface, talking out loud in case it's useful: I hear you about keeping the layout and data structures separate; I'm just also wanting the output data structure to be as useful for putting into other processing steps as possible. The layout process isn't just about assigning coordinates to nodes, it's also about assigning information to edges to weave them around the nodes/other edges somehow. Maybe there's a way to generalize that information so that it's not-sugiyama specific, but is nonetheless more accessible than the series of spline control points? (Though to be fair, that series of points pretty much are the location of the dummy nodes, are they not?). I guess another way to put this is that I'm not clear dummy nodes are really sugiyama specific; the other layouts use some mechanism (haven't looked) to define the edge pathways; does that not map to dummy nodes?

To be clear, I'm not asking for any changes to what you have before I give it a try, just trying to flesh out what I was gesturing out in my comment above.

curiouserrandy avatar Mar 12 '22 00:03 curiouserrandy

For dynadag / graph, yeah, it does seem like exactly everything you're saying. I haven't reached out to the authors yet, but it's probably worth a try. They were interested enough to create it.

As far as unsugify is concerned, first, dummy nodes are sugiyama specific, they're part of how the whole layout works. The other techniques are all topological, so remove a lot of the necessity of it, and instead just assign edge control points based off of the layout.

In terms of data structures, there's really two, and maybe this layout of the steps will make that more clear:

dag // original dag that is of interest, one data structure to keep track of
layering(dag) // updates dag with the layering (in value field) on necessary up until the next step
layers <- sugify(dag) // create a a new data structure that's thats layers of sugi nodes (wrappers around real nodes, or dummy nodes)
decross(layers) // suffles the order of layers in place, no modifications to dag
coord(layers) // assign sugi nodes x coordinates
vertCoord(layers) // assign sugi nodes y coordinates
unsugify(layers) // copy x and y from layers back onto original dag nodes for layout. Somewhat unnecessay if you wanted to construct points from the layers structure

hopefully that makes it clear? It might more sense if layering returned the layers data structure, but then basically every implementation would need to call sugify. I know you already know this in general, but I figured this layout with it calling put what's modified when would make it more clear.

If you wanted to be really "clever" you could probably change the sugi-nodes to return an object that has getters for the sugi nodes x and y. Then you wouldn't need to "unsugify", the x and y in points would just fetch the appropriate fields from the underlying suginode. That cleverness seems unnecessary and requires the suginodes can't get garbage collected, so it's probably not worth doing, but I wanted to call it out.

erikbrinkman avatar Mar 16 '22 00:03 erikbrinkman

I think I'm going to hold off on further comments on the broken out API until I try to use it for something; I can speculate before that point, but I'll really have more useful comments after I try to connect the rubber with the road.

Sorry not to have gotten back to you earlier, by the way--life's been distracting recently. But I've got the intersection routines working. I'm going to add arrows to my test notebook and hand it over to you for commentary on library integration, then probably take a step back and evaluate the various new investigative pathways you and others have suggested. I'll let you know how that all goes.

(The other reference I got off-issue was “Search, Show Context, Expand on Demand”: Supporting Large Graph Exploration with Degree-of-Interest & an open source project (Gephi) that's in this area under more active development, just in case they have interest to you as well.)

curiouserrandy avatar Mar 24 '22 00:03 curiouserrandy

That's fair to hold off. I think it should all be exposed, so feel free to play with it. Always happy to revise things or reconsider as they're used.

No worries on the delay. I only have so much time as well. Excited to take a look at what you have though! I think it should make a good hybrid between full port support and the way that d3-dag generally passes splines off to some other routine.

I'm aware of gephi, or at least was about a decade ago. I was never a fan from a usability / software perspective, but the people behind it definitely spent a lot of time thinking about graphs. The paper itself looks interesting, and definitely worth reading before I pursue my own path.

erikbrinkman avatar Mar 24 '22 03:03 erikbrinkman

Erik: My apologies for the radio silence; various distractions (including, unfortunately, a car accident :-{). But I've updated my main notebook with arrows according to the architecture sketched out above and republished it so that you can take a look and decide how/if you want to integrate it into the main library. Specific pointers:

Primary notebook is still https://observablehq.com/@curiouserrandy/initial-interactive-graph-explorer.

  • The "transform_dag" function is called immediately after layout to modify the link endpoints according to the "adjust_endpoint" functions bound in the node data.
  • I'm using the function "generic_adjust_endpoint" for this purpose, which intersects straight lines from the last control point through the center of the node with either an ellipse or a rectangle. I've written intersection functions for ellipses and rectangles in a separate notebook (https://observablehq.com/@curiouserrandy/notebook-for-line-shape-intersection-operations) and imported them.
  • I compute barbs for the arrows through a function from another notebook (https://observablehq.com/@curiouserrandy/routines-for-arrow-creation) but beyond that the arrow barbs are implemented inline in the plotting.

Feel free to ask me for any modifications to this architecture that you'd like to make it easier to incorporate; I've tried to have what strike me as good architectural bones in what I've done, but it's in JS not TS, so I don't guarantee how easy it will be to import.

My next couple of mods aren't particularly graph layout related (I want to make my control structure separate and mutable in the notebook and create some kind of persistence mechanism for it for better usability) so there's probably going to be another period of radio silence from me :-}. Feel free to ping me if I can be useful.

curiouserrandy avatar May 03 '22 02:05 curiouserrandy

@curiouserrandy I am just curious (ha), when you are animating from one graph stage to the next, are you also able to make it smoothly zoom to fit at the same time that it pans? I've done something similar with another library but could never get that behavior quite right - it would always expand to a non-zoomed new position before zooming in to the desired zoom level, which would mean that some elements would temporarily be outside of the viewport boundaries.

tunesmith avatar May 14 '22 23:05 tunesmith

@tunesmith : My experience with animation and transitions is that they're easy to get wrong, but that it's generally possible to do exactly what you want. Sorta like the rest of d3 :-}.

I'm not completely happy with the animations I have with my graph (and now that you mention it, I think I have the same problem you describe, though maybe in reverse--my graph shrinks and then expands back out when I change the zooming) but I'm confident that that's my holding it wrong and if I dig into how I've specified the two ends of the transition I'd be able to get it to do what I want. If I have a chance, I'll see if I can figure out my behavior (if you want to do it, go to https://observablehq.com/@curiouserrandy/initial-interactive-graph-explorer, click on the "Iron Plate" rectangle, then after that on the newly displayed "Iron Plate" ellipse) and I'll let you know what I find.

curiouserrandy avatar May 16 '22 13:05 curiouserrandy

That sounds about right compared to my experience. For my project that uses dagre-d3, I think I used two simultaneous transitions, one for position and one for zoom, and d3 was apparently smart enough to interpolate them at the same rate. If you go to concludia.org (my private-through-obscurity side project) you can see an example of how it animates. My current navigation style is limited, though - your active/inactive approach is more flexible and is something I’ll try out after I figure out how to replace dagre-d3. I was also thinking that for nodes with too many children, maybe I could work around that with a modal chooser that allows you to select which children to display.

tunesmith avatar May 16 '22 17:05 tunesmith

@tunesmith : I did some digging, and decided that the strange behavior I was seeing was actually coming from the underlying d3 zoom library. I filed an issue on it @ https://github.com/d3/d3-zoom/issues/252. As noted in the issue, I could pretty easily imagine the current behavior as being intended and right for the common use case, so it may end up being a feature request. Obviously, no guarantee that this is the same problem as you were running into, but it seems like it might be related. We'll see what the maintainers say.

curiouserrandy avatar May 22 '22 21:05 curiouserrandy

@curiouserrandy Ah wow, thank you for filing that. And yes, it does have similarities with what I've experienced elsewhere, with cytoscape.js, on a project I have that uses elkjs for layout. You can see an animated gif of the behavior in the ticket I filed here: https://github.com/cytoscape/cytoscape.js/issues/2966. Unfortunately, developing a fix for it was beyond me and the issue closed.

tunesmith avatar May 23 '22 03:05 tunesmith

@curiouserrandy First off, sorry for the radio silence on my end. I've also been busy with various things, but should have more time to devote to this now.

Thanks for the update! Sorry about the car accident, but glad you seem okay! What you implemented seems about like what I expected, and it should be pretty easy to just add a callback function to sugiyama that takes that intersection function just adjust the edge control points. It seems like this is "working" at the moment though, so it also seems like not a pressing need.

I'm also toying with adding support for nodes with arbitrary height. And the interest by multiple people makes me think that it's worth looking at dynadag again...

erikbrinkman avatar May 26 '22 03:05 erikbrinkman

There is some support for interactive use now. Currently it still just does new layouts. For the small to medium size graphs this is designed for, that's not super time intensive, and even if I did further updates, it'd ;ike be tweaks to the existing algorithms to use the previous node positions, which shouldn't be too hard.

Going to close for now, but feel free to reopen.

erikbrinkman avatar Jul 02 '23 17:07 erikbrinkman