[BUG] Rendering restarts until timeout for huge graphs
Describe the bug If I plotted a graph with 95,157 nodes and 12,000,000 edges, graphistry will attempt to render, but towards the completion of the render, it will "refresh" the process and start again. This cycle repeats itself until it enters "herding stray GPUs" and then it goes to timeout.
I would like to add that I tried this for 1,000 nodes and 2,000 edges and does behave the same way. If I tried this at a different time it might work.
To Reproduce Code, including data, than can be run without editing:
import matplotlib.cm as cm
import matplotlib.colors as mcolors
import pandas as pd
import graphistry
#graphistry.register(api=3, username='...', password='...')
num_nodes = 95157
num_edges = 12000000
num_clusters = 744
print(f"Generating a sparse graph with {num_nodes} nodes and {num_edges} edges...")
rows = np.random.randint(0, num_nodes, num_edges)
cols = np.random.randint(0, num_nodes, num_edges)
data = np.ones(num_edges, dtype=int)
adj_matrix_csr = sp.csr_matrix((data, (rows, cols)), shape=(num_nodes, num_nodes))
adj_matrix_csr.setdiag(0)
adj_matrix_csr.eliminate_zeros()
source_nodes, target_nodes = adj_matrix_csr.nonzero()
edges_df = pd.DataFrame({
'source': source_nodes,
'destination': target_nodes,
'edge_weight': 1
})
edges_df.drop_duplicates(subset=['source', 'destination'], inplace=True)
print(f"Generated {len(edges_df)} unique edges.")
cluster_labels = np.random.randint(0, num_clusters, num_nodes)
nodes_df = pd.DataFrame({
'node': np.arange(num_nodes),
'type': cluster_labels
})
print(f"Generated {len(nodes_df)} nodes with {744} cluster labels.")
print("Binding data to PyGraphistry and plotting...")
g = graphistry.edges(edges_df, 'source', 'destination').nodes(nodes_df, 'node')
cmap = cm.get_cmap('viridis', 744) # 'hsv' or 'rainbow' are good for max distinctness, but not perceptually uniform
colors_list = [mcolors.rgb2hex(cmap(i)) for i in range(744)]
custom_cluster_colors_all = {i: colors_list[i] for i in range(744)}
g = g.encode_point_color(
'type',
categorical_mapping=custom_cluster_colors_all
).plot()
print("Plotting command issued. Check your browser or Jupyter output for the visualization.")
g
Expected behavior The graph should render with the colored nodes.
Actual behavior The rendering process keeps restarting when it's almost completed.
Screenshots
https://github.com/user-attachments/assets/082c8707-00d6-46ea-ad3a-5dcb548f666a
Browser environment (please complete the following information):
- OS: Mac OS
- Browser chrome, firefox
- Version 138 chrome
Graphistry GPU server environment
- Where run, Hub
PyGraphistry API client environment
- Where run Graphistry 2.43.4 Jupyter Notebook 4.3.6
- Version 0.41.0
- Python Version Python 3.13.2
Thanks! 12M is probably too big for the server (and we try to cap on shared Hub before then), normally we recommend stay < 2M edges. I think the auto-recovery of shared resources was taking awhile to autoeal, so seems fine now, but that's probably the deeper issue ;-)
Fwiw, we are actively working on ~10X'ing it and bringing more to Hub, though no ETA. For backend side, we are already doing GFQL side on 1B+, and I recall 100M+ working for uploads, but still more to do for layout/rendering tiers.
Sorry I think I'm misunderstanding you @lmeyerov GFQL is a query language and to visualize the graph we still need hub right? In other words, there's no way to visualize 12M nodes correct?
Yes exactly the other components can already do 100m-2b, and we are working right now to get the renderer to 10-20M
@lmeyerov seems like Hub is also facing issues with rendering 1M edges. any chance this is account specific (using too much resources in a small amount of time)? sorry just need to know this for future reference
Unclear - we regularly do 1-2M, though can see instability as in cases like this
(We are working on some better auto healing for blowouts we have been seeing, unclear if you or a noisy neighbor on shared infra)
hi @lmeyerov sorry to be a constant bother around this topic. i noticed in the past 2 days that visualizing the number of nodes was the bottleneck. i tried 95157 nodes (as i've always done) and 1M edges without any luck.
so i scaled down the number of nodes and edges to see where the highest capacity was and it hovers around 10,000 nodes 20,000 edges at the moment.
does this mean that there are not much resources to go around on the shared infra? does graphistry offer a paid tier to get access to dedicated resources? sorry for my ignorance.
@aucahuasi I suspect the full gpu issue is hitting @shon3005 , even not full-on out-of-capacity restarts, just on simple medium graph usage
@shon3005 we do have self-hosting tiers where you control the GPUs, eg, aws g4dn.xl's go far, see graphistry.com/get-started , though the free tier should support what you're doing !
@lmeyerov @aucahuasi gave this a shot a couple of days later and am still running into the gpu restarts
Thank you, @shon3005 . We're currently investigating the issue!
We have brought the service back online, but we are continuing to investigate the incident to understand and resolve the root cause of the issue.