pygraphistry
pygraphistry copied to clipboard
[BUG] memoization fails on df's with lists
neo4j movie memoization fails on df with:
{'bindings': {'edges': roles _bolt_relationship_id type _bolt_start_node_id_key \
0 [Emil] 7 ACTED_IN 8
1 NaN 6 PRODUCED 7
2 NaN 5 DIRECTED 6
3 NaN 4 DIRECTED 5
4 [Agent Smith] 3 ACTED_IN 4
_bolt_end_node_id_key
0 0
1 0
2 0
3 0
4 0 ,
'nodes': name born _bolt_node_id_key type _lbl_Person \
0 Emil Eifrem 1978.0 8 Person True
1 NaN NaN 0 Movie NaN
2 Joel Silver 1952.0 7 Person True
3 Lana Wachowski 1965.0 6 Person True
4 Lilly Wachowski 1967.0 5 Person True
5 Hugo Weaving 1960.0 4 Person True
tagline title released _lbl_Movie
0 NaN NaN NaN NaN
1 Welcome to the Real World The Matrix 1999.0 True
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN ,
'source': '_bolt_start_node_id_key',
'destination': '_bolt_end_node_id_key',
'node': '_bolt_node_id_key',
'edge_label': None,
'edge_color': None,
'edge_size': None,
'edge_weight': None,
'edge_title': None,
'edge_icon': None,
'edge_opacity': None,
'edge_source_color': None,
'edge_destination_color': None,
'point_label': None,
'point_color': None,
'point_size': None,
'point_weight': None,
'point_title': None,
'point_icon': None,
'point_opacity': None,
'point_x': None,
'point_y': None},
'settings': {'height': 500, 'url_params': {'info': 'true'}}}
We should reproduce the memoization hash function fail and file w/ pandas/arrow
Repro:
import graphistry, hashlib, pandas as pd, pyarrow as pa
df = pd.DataFrame({
'x': [ 1, 2, 3],
'y': [ [1], [2,2], None]
})
hashlib.sha256(pd.util.hash_pandas_object(df, index=True).values).hexdigest()
Potential fix: https://github.com/bra-fsn/hashable_df/blob/master/hashable_df/init.py => https://github.com/bra-fsn/autohash/blob/master/autohash/init.py