probablepeople
probablepeople copied to clipboard
Person model returns a CorporationName label
In probablepeople/__init__.py
line 84, this string 'M.A. HSG in Law Seraina Williams' is returning 'CorporationName' label even though the type is set to 'person'.
raw_string : 'M.A. HSG in Law Seraina Williams' tokens : ['M.A.', 'HSG', 'in', 'Law', 'Seraina', 'Williams'] tags : ['GivenName', 'CorporationName', 'CorporationName', 'CorporationName', 'CorporationName', 'Surname']
This issue is breaking the dedupe workflow in parseratorvariable (https://github.com/dedupeio/parseratorvariable).
Since I'm using the Person Name
FieldType, this issue is raised :
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/core.py", line 76, in __call__
filtered_pairs = self.fieldDistance(record_pairs)
File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/core.py", line 101, in fieldDistance
distances = self.data_model.distances(records)
File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/dedupe/datamodel.py", line 82, in distances
record_2[field])
File "/Users/computer/Documents/projects/dedupe/.venv/lib/python3.6/site-packages/parseratorvariable/__init__.py", line 90, in comparator
variable_type = self.variable_types[variable_type_1]
KeyError: 'Corporation'
Either parseratorvariable is not handling the case of probablepeople is returning a wrong label or probablepeople is not returning an error if the label doesn't correspond to the type 'person'.
For those who wants to patch this see https://github.com/dedupeio/parseratorvariable/issues/3