networkx icon indicating copy to clipboard operation
networkx copied to clipboard

Implemented topo_layout for DAGs

Open KOLANICH opened this issue 5 years ago • 4 comments

index

KOLANICH avatar Jan 23 '20 15:01 KOLANICH

Hello @KOLANICH! Thanks for updating this PR.

Line 1211:5: E265 block comment should start with '# ' Line 1227:13: E265 block comment should start with '# ' Line 1228:13: E265 block comment should start with '# ' Line 1229:1: W293 blank line contains whitespace Line 1242:9: E741 ambiguous variable name 'l' Line 1284:5: E303 too many blank lines (2) Line 1307:89: E501 line too long (170 > 88 characters) Line 1308:89: E501 line too long (153 > 88 characters) Line 1309:89: E501 line too long (114 > 88 characters) Line 1310:89: E501 line too long (129 > 88 characters) Line 1335:89: E501 line too long (115 > 88 characters) Line 1362:89: E501 line too long (591 > 88 characters)

Please install and run psf/black.

Comment last updated at 2020-07-21 13:38:42 UTC

pep8speaks avatar Jan 23 '20 15:01 pep8speaks

Can you describe what this is and when a user would want to consider it? Also... any citations/references?

dschult avatar Jan 25 '20 01:01 dschult

No, it's a self-invented heuristics. Completely inefficient. And defective:

  • it doesn't address the situation when edges overlap. For example an edge (0, 0) -> (3, 3) completely occludes (1, 1) -> (2, 2), but I don't know how to implement detection and elimination of such pairs of edges efficiently.
  • it doesn't eliminate unneeded edge intersections and doesn't optimize their count
  • sometimes it generates weird layouts
  • on random DAGs with odd count of vertices it assigns each vertex to an own layer

Though the algo is split into 2 functions each of them can be improved separately. But I am not going to do that, for my purposes (drawing a small enough dependency tree of packages) the algo works.

Its only good point is that it takes not so many lines of code and that it is conceptually naïve and straightforward. It was the main goal - not to code the whole Sugiyama algo which is more than 1000 LOC in C and not to code Reingold Tilford algo which is more than 500 LOC in C.

The algo does the following:

  1. assigns vertices to layers. To do it it

    • assigns each vertex a unique number
    • walks vertices and moves them into the layer further than the maximum layer of its either in or out neighbours
    • repeats until convergence
  2. Orders vertices within layers

    • puts the vertices on a square grid (i, j) where i is a number of the layer and j is the number within layer
    • initializes spring_layout with this grid
    • does spring layout hoping it would reduce intersections and bring related vertices closer to each other.
    • rotates it so to make the first coordinate have the largest variance or maximizing 2d "uniformness-like" of the distribution (in fact not uniformness, all ICA impls use some surrogate for it, and uniformness is itself a proxy to statistical independence). PCA or ICA is used depending on what is available. If scikit-learn is available it uses ICA. If only numpy is available it uses an own small PCA impl.
    • takes the coord with the largest variance, uses ordering in it to determine the order in layers
    • reinitializes the layers bringing the vertices back to the grid and repeats the loop for some iterations

No formal proofs, just heuristics.

KOLANICH avatar Jan 25 '20 08:01 KOLANICH

That's cool. Please put that into a "Notes" section in the doc_string. It'll help the users avoid having to figure out what it does and when to use it. (and maybe prod someone to implement the more complex algorithms.)

dschult avatar Jan 25 '20 17:01 dschult