networkx
networkx copied to clipboard
Implemented topo_layout for DAGs

Hello @KOLANICH! Thanks for updating this PR.
- In the file
networkx/drawing/layout.py:
Line 1211:5: E265 block comment should start with '# ' Line 1227:13: E265 block comment should start with '# ' Line 1228:13: E265 block comment should start with '# ' Line 1229:1: W293 blank line contains whitespace Line 1242:9: E741 ambiguous variable name 'l' Line 1284:5: E303 too many blank lines (2) Line 1307:89: E501 line too long (170 > 88 characters) Line 1308:89: E501 line too long (153 > 88 characters) Line 1309:89: E501 line too long (114 > 88 characters) Line 1310:89: E501 line too long (129 > 88 characters) Line 1335:89: E501 line too long (115 > 88 characters) Line 1362:89: E501 line too long (591 > 88 characters)
Please install and run psf/black.
Comment last updated at 2020-07-21 13:38:42 UTC
Can you describe what this is and when a user would want to consider it? Also... any citations/references?
No, it's a self-invented heuristics. Completely inefficient. And defective:
- it doesn't address the situation when edges overlap. For example an edge (0, 0) -> (3, 3) completely occludes (1, 1) -> (2, 2), but I don't know how to implement detection and elimination of such pairs of edges efficiently.
- it doesn't eliminate unneeded edge intersections and doesn't optimize their count
- sometimes it generates weird layouts
- on random DAGs with odd count of vertices it assigns each vertex to an own layer
Though the algo is split into 2 functions each of them can be improved separately. But I am not going to do that, for my purposes (drawing a small enough dependency tree of packages) the algo works.
Its only good point is that it takes not so many lines of code and that it is conceptually naïve and straightforward. It was the main goal - not to code the whole Sugiyama algo which is more than 1000 LOC in C and not to code Reingold Tilford algo which is more than 500 LOC in C.
The algo does the following:
-
assigns vertices to layers. To do it it
- assigns each vertex a unique number
- walks vertices and moves them into the layer further than the maximum layer of its either in or out neighbours
- repeats until convergence
-
Orders vertices within layers
- puts the vertices on a square grid
(i, j)whereiis a number of the layer andjis the number within layer - initializes spring_layout with this grid
- does spring layout hoping it would reduce intersections and bring related vertices closer to each other.
- rotates it so to make the first coordinate have the largest variance or maximizing 2d "uniformness-like" of the distribution (in fact not uniformness, all ICA impls use some surrogate for it, and uniformness is itself a proxy to statistical independence). PCA or ICA is used depending on what is available. If
scikit-learnis available it uses ICA. If onlynumpyis available it uses an own small PCA impl. - takes the coord with the largest variance, uses ordering in it to determine the order in layers
- reinitializes the layers bringing the vertices back to the grid and repeats the loop for some iterations
- puts the vertices on a square grid
No formal proofs, just heuristics.
That's cool. Please put that into a "Notes" section in the doc_string. It'll help the users avoid having to figure out what it does and when to use it. (and maybe prod someone to implement the more complex algorithms.)