pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

[BUG] memoization fails on df's with lists

Open lmeyerov opened this issue 4 years ago • 1 comments

neo4j movie memoization fails on df with:

{'bindings': {'edges':            roles  _bolt_relationship_id      type  _bolt_start_node_id_key  \
  0         [Emil]                      7  ACTED_IN                        8   
  1            NaN                      6  PRODUCED                        7   
  2            NaN                      5  DIRECTED                        6   
  3            NaN                      4  DIRECTED                        5   
  4  [Agent Smith]                      3  ACTED_IN                        4   
     _bolt_end_node_id_key  
  0                      0  
  1                      0  
  2                      0  
  3                      0  
  4                      0  ,
  'nodes':               name    born  _bolt_node_id_key    type _lbl_Person  \
  0      Emil Eifrem  1978.0                  8  Person        True   
  1              NaN     NaN                  0   Movie         NaN   
  2      Joel Silver  1952.0                  7  Person        True   
  3   Lana Wachowski  1965.0                  6  Person        True   
  4  Lilly Wachowski  1967.0                  5  Person        True   
  5     Hugo Weaving  1960.0                  4  Person        True   
                       tagline       title  released _lbl_Movie  
  0                        NaN         NaN       NaN        NaN  
  1  Welcome to the Real World  The Matrix    1999.0       True  
  2                        NaN         NaN       NaN        NaN  
  3                        NaN         NaN       NaN        NaN  
  4                        NaN         NaN       NaN        NaN  
  5                        NaN         NaN       NaN        NaN  ,
  'source': '_bolt_start_node_id_key',
  'destination': '_bolt_end_node_id_key',
  'node': '_bolt_node_id_key',
  'edge_label': None,
  'edge_color': None,
  'edge_size': None,
  'edge_weight': None,
  'edge_title': None,
  'edge_icon': None,
  'edge_opacity': None,
  'edge_source_color': None,
  'edge_destination_color': None,
  'point_label': None,
  'point_color': None,
  'point_size': None,
  'point_weight': None,
  'point_title': None,
  'point_icon': None,
  'point_opacity': None,
  'point_x': None,
  'point_y': None},
 'settings': {'height': 500, 'url_params': {'info': 'true'}}}

We should reproduce the memoization hash function fail and file w/ pandas/arrow

lmeyerov avatar Feb 08 '21 17:02 lmeyerov

Repro:

import graphistry, hashlib, pandas as pd, pyarrow as pa
df = pd.DataFrame({
    'x': [ 1, 2, 3],
    'y': [ [1], [2,2], None]
})
hashlib.sha256(pd.util.hash_pandas_object(df, index=True).values).hexdigest()

Potential fix: https://github.com/bra-fsn/hashable_df/blob/master/hashable_df/init.py => https://github.com/bra-fsn/autohash/blob/master/autohash/init.py

lmeyerov avatar Feb 09 '21 02:02 lmeyerov