lieu icon indicating copy to clipboard operation
lieu copied to clipboard

'Namespace' object has no attribute 'use_containing' in dedupe_geojson.py

Open thisisaaronland opened this issue 6 years ago • 3 comments

Is the EMR dedupe geojson job missing options? I am seeing the following errors when trying to run lieu (f55fe8bf232525679baac4c3db1387cb37d16e14) in EMR:

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 204, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 137, in spark
    use_containing = self.options.use_containing
AttributeError: 'Namespace' object has no attribute 'use_containing'

It's not clear to me if/how/where args are defined or inherited, in part because the use_city = self.options.use_city statement on line 136 does not appear to fail, even it doesn't seem to be defined anywhere either...

grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs
.//dedupe_geojson.py:137:        use_containing = self.options.use_containing
.//dedupe_geojson.py:152:            dupes_with_classes_and_sims = NameAddressDeduperSpark.dupe_sims(address_ids, geo_model=geo_model, geo_model_proportion=geo_model_proportion, index_type=index_type, name_dupe_threshold=name_dupe_threshold, name_review_threshold=name_review_threshold, with_address=with_address, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name, with_phone_number=with_phone_number, name_and_address_keys=name_and_address_keys, name_only_keys=name_only_keys, address_only_keys=address_only_keys)
.//dedupe_geojson.py:154:            dupes_with_classes_and_sims = AddressDeduperSpark.dupe_sims(address_ids, with_unit=with_unit, with_latlon=use_latlon, with_city_or_equivalent=use_city, with_small_containing_boundaries=use_containing, with_postal_code=use_postal_code, fuzzy_street_name=fuzzy_street_name)
grep -n -r -e 'use_containing' /usr/local/openvenues/lieu/scripts/jobs | wc -l
       0

thisisaaronland avatar Jul 23 '18 19:07 thisisaaronland

Similarly:

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 206, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 149, in spark
    name_only = self.options.name_only
AttributeError: 'Namespace' object has no attribute 'name_only'

thisisaaronland avatar Jul 23 '18 21:07 thisisaaronland

No handlers could be found for logger "mrjob.launch"
Traceback (most recent call last):
  File "dedupe_geojson.py", line 213, in <module>
    DedupeGeoJSONJob.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 454, in execute
    self.run_spark(self.options.step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 647, in run_spark
    spark_method(input_path, output_path)
  File "dedupe_geojson.py", line 184, in spark
    with_unit=self.options.with_unit)
TypeError: explain_name_address_dupe() got an unexpected keyword argument 'name_dupe_threshold'

thisisaaronland avatar Jul 23 '18 22:07 thisisaaronland

FWIW, the following changes fix all the errors above although I can't be sure I'm not glossing over some important details...

https://github.com/openvenues/lieu/compare/master...sfomuseum:debug

thisisaaronland avatar Jul 25 '18 01:07 thisisaaronland