cugraph
                                
                                
                                
                                    cugraph copied to clipboard
                            
                            
                            
                        [BUG] MG Property Graph add_vertex_data crashes
Describe the bug When I try and add data
Cell In [21], line 17 13 #ddf = gdf 15 print(f"read recs {start_id} to {end_id} and now adding to PG") ---> 17 pG.add_vertex_data(ddf, vertex_col_name='id', type_name='paper') 19 #print(f"PG now contains {pG.get_num_vertices()} ") 22 rec_read = end_id
File ~/anaconda3/envs/cugraph_dev/lib/python3.9/site-packages/cugraph-22.8.0a0+166.gd98ddc69-py3.9-linux-x86_64.egg/cugraph/dask/structure/mg_property_graph.py:405, in EXPERIMENTAL__MGPropertyGraph.add_vertex_data(self, dataframe, vertex_col_name, type_name, property_columns) 398 # Ensure that both the predetermined vertex ID column name and vertex 399 # type column name are present for proper merging. 400 401 # NOTE: This copies the incoming DataFrame in order to add the new 402 # columns. The copied DataFrame is then merged (another copy) and then 403 # deleted when out-of-scope. 404 tmp_df = dataframe.copy() --> 405 tmp_df[self.vertex_col_name] = tmp_df[vertex_col_name] 406 # FIXME: handle case of a type_name column already being in tmp_df 407 tmp_df[self.type_col_name] = type_name ... File ~/anaconda3/envs/cugraph_dev/lib/python3.9/site-packages/numpy/core/_methods.py:44, in _amin(a, axis, out, keepdims, initial, where) 42 def _amin(a, axis=None, out=None, keepdims=False, 43 initial=_NoValue, where=True): ---> 44 return umr_minimum(a, axis, None, out, keepdims, initial, where)
TypeError: '<=' not supported between instances of 'str' and 'int'
Thanks.  I can reproduce.  This is actually an error in dask.  Here is an example that goes through a similar code path that give the same error:
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4], 1:[5, 6]})
ddf = dd.from_pandas(df, npartitions=2)
ddf["c"] = df["a"]  # <-- gives the error you see
df.mean(axis=0)
ddf.mean(axis=0)  # <-- gives similar error
A workaround is to have all column names be the same dtype:
gdf.columns = gdf.columns.astype(str)
                                    
                                    
                                    
                                
Fixing in https://github.com/dask/dask/pull/9485
This issue can be closed.
This is fixed in dask version 2022.9.1, which was released on September 19.
We may need a pin to a minimum version of dask.  dask>=2022.9.1
Workaround is to use only strings for column names, no mixing of strings, ints, etc.
closed via https://github.com/dask/dask/pull/9485