synthpop icon indicating copy to clipboard operation
synthpop copied to clipboard

Parallel py3

Open mxndrwgrdnr opened this issue 6 years ago • 7 comments

  • updated synthpop for python 3
  • updated the census attributes retained in starter2.py
  • updated ipu.py so that rather than erroring out when max_iterations is reached, it generates a warning.
  • added method synthesize_all_in_parallel() to synthesizer.py for processing sub-county geographies in parallel.

Could use an extra set of eyes at L264-265 in synthesizer.py. These two lines account for the fact that when processing geographies in parallel, draw.draw_households() cannot rely on the hh_index_start argument to set the household_id either as the index of the synthetic households table or the hh_id field of the synthetic persons table. Thus we have to adjust the households index and persons hh_id field AFTER all the tables have been generated, which is what's hapenning at L264-265.

mxndrwgrdnr avatar Apr 04 '18 19:04 mxndrwgrdnr

Coverage Status

Coverage decreased (-1.2%) to 81.643% when pulling c18011629a67aa14ac525c4eea22fd6a74171b70 on parallel_py3 into d1de14a2d70a83d366294579f792e8b2ee1c9b9a on master.

coveralls avatar Apr 04 '18 22:04 coveralls

Thanks very much for this great contribution @mxndrwgrdnr! We'll review this soon, sorry for the delay. There is an existing earlier open PR that was reveiewed, which we kept open for a bit in the case anyone from the community wanted to review, that we'll merge in first. And then we can help reconcile with this PR.

janowicz avatar Apr 18 '18 16:04 janowicz

@cvanoli @janowicz this should be good to go now folks.

mxndrwgrdnr avatar Jun 28 '18 01:06 mxndrwgrdnr

Awesome, thanks @mxndrwgrdnr! I'll check it out today

janowicz avatar Jun 28 '18 14:06 janowicz

thanks !! @mxndrwgrdnr

cvanoli avatar Jun 28 '18 15:06 cvanoli

I tested the branch with a local environment in my computer and two Future Warnings keep coming up: 1 -synthpop\ipu\ipu.py: Line191

RuntimeWarning: divide by zero encountered in double_scalars adj = constraint / float((column * weights).sum())

2 - synthpop\census_helpers.py: Line 180:

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. pums = pd.concat([pums, pums00], ignore_index=True)

cvanoli avatar Jul 03 '18 15:07 cvanoli

@janowicz the second FutureWarning that @cvanoli mentions isn't a big deal and should resolve itself in the future. The first FutureWarning, however, is happening during IPU when household weights that are very nearly zero are not close enough to zero to be caught by ipu._drop_zeros but close enough to trigger a divide by zero RuntimeWarning. It's just a warning and not an error so I don't think it is effecting performance but I wanted to bring it to attention and see if you had an opinion.

mxndrwgrdnr avatar Jul 03 '18 18:07 mxndrwgrdnr