datashader icon indicating copy to clipboard operation
datashader copied to clipboard

Improve graph line rendering

Open sagfox opened this issue 3 years ago • 5 comments

Add line functionality to accept input in form (x_src, y_src, x_dst, y_dst), and run on gpu

sagfox avatar Jul 05 '22 19:07 sagfox

Codecov Report

Merging #1100 (f826f4e) into master (8f9ec7c) will decrease coverage by 0.04%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1100      +/-   ##
==========================================
- Coverage   85.07%   85.02%   -0.05%     
==========================================
  Files          34       34              
  Lines        7516     7515       -1     
==========================================
- Hits         6394     6390       -4     
- Misses       1122     1125       +3     
Impacted Files Coverage Δ
datashader/core.py 88.11% <ø> (ø)
datashader/glyphs/polygon.py 95.33% <0.00%> (-0.67%) :arrow_down:
datashader/glyphs/line.py 93.43% <0.00%> (-0.22%) :arrow_down:
datashader/macros.py 92.92% <0.00%> (-0.08%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8f9ec7c...f826f4e. Read the comment docs.

codecov[bot] avatar Jul 06 '22 05:07 codecov[bot]

@sagfox I have been thinking about this since we talked, and I think the first task is to identify exactly what your preferred data format is. If that is a DataFrame with 4 columns x_src, y_src, x_dst, y_dst then Datashader already supports that via the LineAxis1 class:

import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf

cvs = ds.Canvas(plot_width=100, plot_height=100)
df = pd.DataFrame(dict(
    x_src=[2, 9, 5, 3], y_src=[1, 9, 1, 9],
    x_dst=[1, 3, 7, 9], y_dst=[5, 6, 5, 2],
))
agg = cvs.line(source=df, x=["x_src", "x_dst"], y=["y_src", "y_dst"], axis=1, agg=ds.count())
im = tf.shade(agg)
ds.utils.export_image(im, "lines_2pts", background="white")

But if your motivation is graphs with nodes and edges and each node can potentially have many incident edges then this format duplicates your node coordinates. Maybe a better approach is support for indexed lines, i.e. the nodes as a sequence of (x, y) coordinates and the edges a sequence of (start_index, end_index) which index into the nodes sequence. This would seem to imply two separate DataFrames of different lengths, which Datashader doesn't yet support, but that should not be insurmountable.

ianthomas23 avatar Jul 07 '22 09:07 ianthomas23

This would seem to imply two separate DataFrames of different lengths, which Datashader doesn't yet support, but that should not be insurmountable.

That would be a really useful format to support, because it would facilitate associating other columns of data with each node for use with inspect_points hovering or drilldown. Duplicating the coordinates themselves isn't too bad, but duplicating the associated metadata can get extremely expensive.

jbednar avatar Jul 07 '22 15:07 jbednar

@sagfox I have been thinking about this since we talked, and I think the first task is to identify exactly what your preferred data format is. If that is a DataFrame with 4 columns x_src, y_src, x_dst, y_dst then Datashader already supports that via the LineAxis1 class:

import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf

cvs = ds.Canvas(plot_width=100, plot_height=100)
df = pd.DataFrame(dict(
    x_src=[2, 9, 5, 3], y_src=[1, 9, 1, 9],
    x_dst=[1, 3, 7, 9], y_dst=[5, 6, 5, 2],
))
agg = cvs.line(source=df, x=["x_src", "x_dst"], y=["y_src", "y_dst"], axis=1, agg=ds.count())
im = tf.shade(agg)
ds.utils.export_image(im, "lines_2pts", background="white")

@ianthomas23 that is interesting, I have never come across examples which demonstrate that. This could be all we need, but I was wondering for the graph edge implementation in datashader here, why is it using a path, with NaN after every 2 rows, to break the path to render it as a separate line, instead of the (x_src, y_src, x_dst, y_dst) approach? Or is the same thing underneath?

AjayThorve avatar Jul 07 '22 19:07 AjayThorve

for the graph edge implementation in datashader here, why is it using a path, with NaN after every 2 rows, to break the path to render it as a separate line, instead of the (x_src, y_src, x_dst, y_dst) approach? Or is the same thing underneath?

Good question! I believe the answer simply is that edge bundling was implemented by Ian Calvert before the other line formats were implemented by Jon Mease. It does sound like it would be much more efficient to update bundling to use this more efficient format.

Datashader suffers a bit from a history of "drive-by" contributions from specific people at specific times that then aren't fully integrated with the rest of the codebase as new features are introduced because of a lack of core maintainer staffing. I'm hoping that Ian now being on staff we can address issues like that, but that will only happen gradually as he starts to work on topics that overlap with a particular area of the code. So if we have a project involving edge bundling (which is on our list but quite low in priority right now), then we can revisit and update that.

jbednar avatar Jul 07 '22 21:07 jbednar