pygraphistry
pygraphistry copied to clipboard
[BUG] ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph when calling spark.sql()
Describe the bug
The following code used to work, but is now throwing an error, assuming the datatype of the resulting df changed from SparkDataFrame to pyspark.sql.connect.dataframe.DataFrame
df = spark.sql("SELECT * FROM honeypot")
g2 = graphistry.edges(df, 'attackerIP', 'victimIP')
g2.plot()
simply adding .toPandas() to the df on input to edges() fixes the problem, but we should handle in the client.
error:
ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <command-2934552628071172>, line 1
----> 1 g.plot()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/PlotterBase.py:1404, in PlotterBase.plot(self, graph, nodes, name, description, render, skip_upload, as_files, memoize, extra_html, override_html_style)
1401 PyGraphistry.refresh()
1402 logger.debug("4. @PloatterBase plot: PyGraphistry.org_name(): {}".format(PyGraphistry.org_name()))
-> 1404 dataset = self._plot_dispatch(g, n, name, description, 'arrow', self._style, memoize)
1405 if skip_upload:
1406 return dataset
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/PlotterBase.py:1701, in PlotterBase._plot_dispatch(self, graph, nodes, name, description, mode, metadata, memoize)
1698 except ImportError:
1699 pass
-> 1701 error('Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.')
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/util.py:280, in error(msg)
279 def error(msg):
--> 280 raise ValueError(msg)
ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.
To Reproduce
Lab 2 - Data Preparation and Styling-ExpectedPandasArrowSparkDataframe.zip
We should support multiple spark versions, sounds like impacts potentially these:
- Spark availability sniffing: https://github.com/graphistry/pygraphistry/blob/2506b798ec723e906c1c5279f613fe0c37bdbad2/graphistry/PlotterBase.py#L80
- Dispatch: https://github.com/graphistry/pygraphistry/blob/2506b798ec723e906c1c5279f613fe0c37bdbad2/graphistry/PlotterBase.py#L1682
- Arrow coercion: https://github.com/graphistry/pygraphistry/blob/2506b798ec723e906c1c5279f613fe0c37bdbad2/graphistry/PlotterBase.py#L1901