synthpop
synthpop copied to clipboard
Parallel py3
- updated synthpop for python 3
- updated the census attributes retained in starter2.py
- updated ipu.py so that rather than erroring out when
max_iterations
is reached, it generates a warning. - added method
synthesize_all_in_parallel()
to synthesizer.py for processing sub-county geographies in parallel.
Could use an extra set of eyes at L264-265 in synthesizer.py. These two lines account for the fact that when processing geographies in parallel, draw.draw_households()
cannot rely on the hh_index_start
argument to set the household_id
either as the index of the synthetic households table or the hh_id
field of the synthetic persons table. Thus we have to adjust the households index and persons hh_id
field AFTER all the tables have been generated, which is what's hapenning at L264-265.
Coverage decreased (-1.2%) to 81.643% when pulling c18011629a67aa14ac525c4eea22fd6a74171b70 on parallel_py3 into d1de14a2d70a83d366294579f792e8b2ee1c9b9a on master.
Thanks very much for this great contribution @mxndrwgrdnr! We'll review this soon, sorry for the delay. There is an existing earlier open PR that was reveiewed, which we kept open for a bit in the case anyone from the community wanted to review, that we'll merge in first. And then we can help reconcile with this PR.
@cvanoli @janowicz this should be good to go now folks.
Awesome, thanks @mxndrwgrdnr! I'll check it out today
thanks !! @mxndrwgrdnr
I tested the branch with a local environment in my computer and two Future Warnings keep coming up: 1 -synthpop\ipu\ipu.py: Line191
RuntimeWarning: divide by zero encountered in double_scalars
adj = constraint / float((column * weights).sum())
2 - synthpop\census_helpers.py: Line 180:
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.
pums = pd.concat([pums, pums00], ignore_index=True)
@janowicz the second FutureWarning that @cvanoli mentions isn't a big deal and should resolve itself in the future. The first FutureWarning, however, is happening during IPU when household weights that are very nearly zero are not close enough to zero to be caught by ipu._drop_zeros
but close enough to trigger a divide by zero RuntimeWarning. It's just a warning and not an error so I don't think it is effecting performance but I wanted to bring it to attention and see if you had an opinion.