lieu
lieu copied to clipboard
'Namespace' object has no attribute 'use_containing' in dedupe_geojson.py
Is the EMR dedupe geojson job missing options? I am seeing the following errors when trying to run lieu (f55fe8bf232525679baac4c3db1387cb37d16e14
) in EMR:
No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
File "dedupe_geojson.py", line 204, in <module>
DedupeGeoJSONJob.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
mr_job.execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
self.run_spark(self.options.step_num)
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
spark_method(input_path, output_path)
File "dedupe_geojson.py", line 137, in spark
use_containing = self.options.use_containing
AttributeError: 'Namespace' object has no attribute 'use_containing'
It's not clear to me if/how/where args are defined or inherited, in part because the use_city = self.options.use_city
statement on line 136 does not appear to fail, even it doesn't seem to be defined anywhere either...
grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs
.//dedupe_geojson.py:137: use_containing = self.options.use_containing
.//dedupe_geojson.py:152: dupes_with_classes_and_sims = NameAddressDeduperSpark.dupe_sims(address_ids, geo_model=geo_model, geo_model_proportion=geo_model_proportion, index_type=index_type, name_dupe_threshold=name_dupe_threshold, name_review_threshold=name_review_threshold, with_address=with_address, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name, with_phone_number=with_phone_number, name_and_address_keys=name_and_address_keys, name_only_keys=name_only_keys, address_only_keys=address_only_keys)
.//dedupe_geojson.py:154: dupes_with_classes_and_sims = AddressDeduperSpark.dupe_sims(address_ids, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name)
grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs | wc -l
0
Similarly:
No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
File "dedupe_geojson.py", line 206, in <module>
DedupeGeoJSONJob.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
mr_job.execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
self.run_spark(self.options.step_num)
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
spark_method(input_path, output_path)
File "dedupe_geojson.py", line 149, in spark
name_only = self.options.name_only
AttributeError: 'Namespace' object has no attribute 'name_only'
No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
File "dedupe_geojson.py", line 213, in <module>
DedupeGeoJSONJob.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
mr_job.execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
self.run_spark(self.options.step_num)
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
spark_method(input_path, output_path)
File "dedupe_geojson.py", line 184, in spark
with_unit=self.options.with_unit)
TypeError: explain_name_address_dupe() got an unexpected keyword argument 'name_dupe_threshold'
FWIW, the following changes fix all the errors above although I can't be sure I'm not glossing over some important details...
https://github.com/openvenues/lieu/compare/master...sfomuseum:debug