networkD3 icon indicating copy to clipboard operation
networkD3 copied to clipboard

sankey network with too many nodes to display horizontally makes layout go crazy

Open wlandau opened this issue 5 years ago • 12 comments

Sankey networks have been super useful in drake. However, I am noticing rendering problems when the network is super large: https://github.com/ropensci/drake/issues/836.

library(networkD3)
library(tidyverse)
n <- 50
nodes <- data.frame(
  id = seq(0L, n)
)
edges <- data.frame(
  from = rep(0L, n),
  to = seq_len(n),
  value = 1L
)
sankeyNetwork(
  Links = edges,
  Nodes = nodes,
  Source = "from",
  Target = "to",
  Value = "value"
)

Created on 2019-04-16 by the reprex package (v0.2.1)

Is there a way to improve how the edges render on the left? The desired result is similar to the graph below.

library(networkD3)
library(tidyverse)
n <- 10
nodes <- data.frame(
  id = seq(0L, n)
)
edges <- data.frame(
  from = rep(0L, n),
  to = seq_len(n),
  value = 1L
)
sankeyNetwork(
  Links = edges,
  Nodes = nodes,
  Source = "from",
  Target = "to",
  Value = "value"
)

Created on 2019-04-16 by the reprex package (v0.2.1)

wlandau avatar Apr 16 '19 15:04 wlandau

Sort the data in the order you want it to display and set iterations = 0

cjyetman avatar Apr 16 '19 15:04 cjyetman

@cjyetman, would you elaborate on how to sort the data the above reprex? I was under the impression that the row order of Nodes and Links did not matter, and iterations = 0 by itself does not remove the rendering issue.

wlandau avatar Apr 16 '19 17:04 wlandau

Iterations =0 effectively disables the algorithm which automatically determines the node placement, and the nodes will be placed in the order that they appear in the Nodes dataframe.

cjyetman avatar Apr 16 '19 17:04 cjyetman

How would you recommend I determine node order? How do different node orders affect the positions of the edges?

I am developing a reusable tool on top of networkD3, and I cannot predict in advance the graphs it will need to render. So in my case, finding an optimal node order would need to be fully automated.

wlandau avatar Apr 16 '19 19:04 wlandau

The desired order of nodes is entirely up to you, and likely application dependent, so I can’t make a recommendation there.

I understood from your original message that you don’t want the edges to cross over each other as they do in the first image and do not in the second. The underlying JavaScript runs an iterative algorithm to achieve an “optimal” layout, starting with the node positions in the order they appear in the data, then iteratively adjusting their positions to approach an “optimal” layout. That is actually the primary purpose of the function. If you don’t want that algorithm to adjust the positions, then set iterations to 0 and what you see in your first image will not occur.

cjyetman avatar Apr 16 '19 19:04 cjyetman

Sorry, now that I've had a chance to look at your images on a larger screen, I can see that the problem is not exactly what I thought it was. I will get back to you with a solution shortly.

cjyetman avatar Apr 16 '19 19:04 cjyetman

Essentially what seems to be happening is that when there is not enough horizontal space to plot the nodes in a given x-position/column (in this case, the right column of nodes), then the layout goes bonkers. The easy fix is to increase the horizontal size of your viewer and refresh it, but that's obviously not the most ideal solution. My guess is that a better solution will take a good bit of work, so I wouldn't expect a quick resolution unless someone jumps in with significant time and experience.

cjyetman avatar Apr 16 '19 21:04 cjyetman

Thanks for looking into this.

To take a step back here, how would you recommend I use networkD3 to visualize dependency graphs for drake? drake's graphs

  • Are arbitrary DAGs.
  • Should able to handle tens of thousands of nodes.
  • Should use a layout that conveys a hierarchical arrangement of dependencies. (Sugiyama does not go far enough.)
  • Should flow from left to right.
  • Should respect a 1:1 aspect ratio.

Existing examples are here. They render nicely for small workflows, but I do not have a one-size-fits-all solution for large workflows.

wlandau avatar Apr 17 '19 15:04 wlandau

Realistically, any plot with "tens of thousands" of discrete nodes is going to be messy at best, if not totally unusable... unless you're plotting at a very high resolution. Each node is going to need minimum one pixel plus a one pixel boundary so that it can be distinguished from the other nodes... so a 500x500 pixel plot can show about 125,000 nodes in a checkerboard pattern, but that's with no layout optimization and with no edges plotted and the nodes and boundaries would be so small they would barely be visible. Make the nodes a bit bigger, increase the boundary between them, and add in edges that are distinguishable, and that number drops drastically, and by now you still haven't done any layout which should be the goal. So the number of pixels you have available is really the main constraint. Tens of thousands of nodes is probably unrealistic, but I'd be happy to be proven wrong.

cjyetman avatar Apr 17 '19 19:04 cjyetman

A=A+A=A

qhq17 avatar Apr 17 '19 20:04 qhq17

A fair point. What about less extreme scenarios, say, in the hundreds of vertices, with minor overlap if the rendered nodes are large enough?

wlandau avatar Apr 17 '19 20:04 wlandau

If you have a target resolution (e.g. Rstudio Viewer pane at about 1/4 of the screen), you could estimate when you will have too many nodes to display there and open in a browser automatically, or maybe warn the user that they may have to open the plot in a larger window to see it properly?

cjyetman avatar Apr 21 '19 09:04 cjyetman