probablepeople icon indicating copy to clipboard operation
probablepeople copied to clipboard

Person model returns a CorporationName label

Open mlollo opened this issue 6 years ago • 0 comments

In probablepeople/__init__.py line 84, this string 'M.A. HSG in Law Seraina Williams' is returning 'CorporationName' label even though the type is set to 'person'.

raw_string : 'M.A. HSG in Law Seraina Williams' tokens : ['M.A.', 'HSG', 'in', 'Law', 'Seraina', 'Williams'] tags : ['GivenName', 'CorporationName', 'CorporationName', 'CorporationName', 'CorporationName', 'Surname']

This issue is breaking the dedupe workflow in parseratorvariable (https://github.com/dedupeio/parseratorvariable). Since I'm using the Person Name FieldType, this issue is raised :

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/core.py", line 76, in __call__
    filtered_pairs = self.fieldDistance(record_pairs)
  File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/core.py", line 101, in fieldDistance
    distances = self.data_model.distances(records)
  File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/datamodel.py", line 82, in distances
    record_2[field])
  File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/parseratorvariable/__init__.py", line 90, in comparator
    variable_type = self.variable_types[variable_type_1]
KeyError: 'Corporation'

Either parseratorvariable is not handling the case of probablepeople is returning a wrong label or probablepeople is not returning an error if the label doesn't correspond to the type 'person'.

For those who wants to patch this see https://github.com/dedupeio/parseratorvariable/issues/3

mlollo avatar May 01 '18 18:05 mlollo