facets icon indicating copy to clipboard operation
facets copied to clipboard

base_generic_feature_statistics_generator.py throws AttributeError

Open iampatgrady opened this issue 7 years ago • 1 comments

I ran this code on my jupyter notebook:

gfsg = GenericFeatureStatisticsGenerator()  
proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': df}])  
protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")  
  
HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html" >  
        <facets-overview id="elem1"></facets-overview>  
        <script>  
          document.querySelector("#elem1").protoInput = "{protostr}";  
        </script>"""  
  
html = HTML_TEMPLATE.format(protostr=protostr)  
display(HTML(html))  

Got this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-e96a5ba70ee7> in <module>()
      1 # Calculate the feature statistics proto from the datasets and stringify it for use in facets overview
      2 gfsg = GenericFeatureStatisticsGenerator()
----> 3 proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': df}])
      4 protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")
      5 

/Users/Pat/Dev/facets/facets_overview/python/base_generic_feature_statistics_generator.py in ProtoFromDataFrames(self, dataframes, histogram_categorical_levels_count)
     59     return self.GetDatasetsProto(
     60       datasets,
---> 61       histogram_categorical_levels_count=histogram_categorical_levels_count)
     62 
     63   def DtypeToType(self, dtype):

/Users/Pat/Dev/facets/facets_overview/python/base_generic_feature_statistics_generator.py in GetDatasetsProto(self, datasets, features, histogram_categorical_levels_count)
    261               for item in value['vals']:
    262                 strs.append(item if hasattr(item, '__len__') else
--> 263                   item.encode('utf-8'))
    264 
    265               featstats.avg_length = np.mean(np.vectorize(len)(strs))

AttributeError: 'int' object has no attribute 'encode'

Using this advice on StackOverflow, to fix this I changed line 263 of base_generic_feature_statistics_generator.py to:

for item in value['vals']:
                strs.append(item if hasattr(item, '__len__') else
                  repr(item).encode('utf-8'))  # added repr(item)

I added the repr() to the item before the encode() and that seems to work now. I think y'all aren't accepting pull requests so here is my bug; submitting mainly for the folks who encounter this on google!

To get it working I had to delete the /Users/Pat/Dev/facets/facets_overview/python/base_generic_feature_statistics_generator.pyc file and also restart the notebook kernel.

Perhaps there is a better way to handle this? Let me know, I'm rather ignorant of the bug filing process.

iampatgrady avatar Mar 27 '18 22:03 iampatgrady

Thanks for the bug report!

What is the dataframe from which you were generating feature statistics? Is it private, or would you be willing to share it to help debug this? It seems like in your case, the feature (column) of the dataframe on which it failed was some type of int, but facets' feature statistics generator thought it was some sort of non-numeric type (otherwise it wouldn't be executing line 263 of base_generic_feature_statistics_generator.py).

Facets decides if a feature in a dataframe is a numeric or non-numeric type using this logic: https://github.com/PAIR-code/facets/blob/master/facets_overview/python/base_generic_feature_statistics_generator.py#L63

jameswex avatar Mar 28 '18 14:03 jameswex