GerryChain icon indicating copy to clipboard operation
GerryChain copied to clipboard

The same updater on Partitions with different units gives incorrect Elections results

Open gabeschoenbach opened this issue 3 years ago • 5 comments

Consider this setup, where we try to generate two GerryChain Partitions, one built from the North Carolina Census block dual graph, the other from the VTD dual graph.

from gerrychain import Graph, Partition, Election

graph_blocks = Graph.from_json("nc_bcks.json")
graph_vtds   = Graph.from_json("NC_full_vote_vtd.json")

We want to track the outcome of any election for which we have data on both dual graphs. For example, we can use the 2020 Governor's election, where the Democratic and Republican candidates' vote totals are stored in the CooperD_20G_Governor and ForestR_20G_Governor columns, respectively. We can check that the vote totals for each candidate are the same in both dual graphs:

import math

for party_column in ["CooperD_20G_Governor", "ForestR_20G_Governor"]:
    block_sum = sum([graph_blocks.nodes[n][party_column] for n in graph_blocks.nodes])
    vtd_sum   = sum([graph_vtds.nodes[n][party_column] for n in graph_vtds.nodes])
    assert math.isclose(block_sum, vtd_sum)

But if we use an Elections updater to track this election in a GerryChain Partition, votes are lost in the VTD partition! The final assert statement will return an error

updaters = {"GOV20": Election("GOV20", {"Dem":"CooperD_20G_Governor", "Rep":"ForestR_20G_Governor"})}

# this bug is independent of the partition assignment, so here we assign every node to district 0
partition_blocks = Partition(graph_blocks, {n:0 for n in graph_blocks.nodes}, updaters)
partition_vtds   = Partition(graph_vtds, {n:0 for n in graph_vtds.nodes}, updaters)

assert math.isclose(partition_blocks["GOV20"].totals[0],partition_vtds["GOV20"].totals[0])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/var/folders/nc/1qx970qn7p17yq9cmwrq9fmw0000gn/T/ipykernel_96944/2002769006.py in <module>
----> 1 assert math.isclose(partition_blocks["GOV20"].totals[0],partition_vtds["GOV20"].totals[0])

AssertionError: 

This error is propagated out so it will be most visible to the user as:

partition_blocks["GOV20"].percent("Dem") == 0.5228991188220035
partition_blocks["GOV20"].percent("Rep") == 0.5408567558910189

...where only the blocks Partition is correct.

This bug disappears if you pass different (though identical) updaters to each partition:

updaters_blocks = {"GOV20": Election("GOV20", {"Dem":"CooperD_20G_Governor", "Rep":"ForestR_20G_Governor"})}
updaters_vtds   = {"GOV20": Election("GOV20", {"Dem":"CooperD_20G_Governor", "Rep":"ForestR_20G_Governor"})}

partition_blocks = Partition(graph_blocks, {n:0 for n in graph_blocks.nodes}, updaters_blocks)
partition_vtds   = Partition(graph_vtds, {n:0 for n in graph_vtds.nodes}, updaters_vtds)

assert math.isclose(partition_blocks["GOV20"].totals[0],partition_vtds["GOV20"].totals[0])

@apizzimenti can explain further why this is happening, and the fix is due to him!

gabeschoenbach avatar Nov 17 '21 15:11 gabeschoenbach

This bug is a subtle one and should only crop up iff

  1. the vertices in each dual graph (in the above example corresponding to blocks and VTDs) are integer indexed such that the vertex indices of one dual graph are a subset of the other dual graph's vertex indices;
  2. the same dictionary of updaters is used for each Partition.

The Partition class, when initialized, checks whether an argument has been passed to the parent parameter. If not, it assumes it's the first Partition in a series of Partitions and calls the _first_time() private method. Among other things, this method assigns the passed dictionary of user-defined updaters to the Partition instance's Partition.updaters property, but does so in a sneakily cheap way: while it .copy()s the dictionary of default updaters before assignment, it does not .copy() the dictionary of user-defined updaters. As a result, the dictionary of updaters is shared across whichever Partition objects were instantiated with them.

However, unintentional (and undesirable) reference-sharing across Partition instances is not, in itself, the cause of this bug. Partitions maintain an internal assignment which maps dual graph vertex indices to district numbers; broadly, updaters (like Tallys and Elections) use this mapping to compute districts' statistics. To save time, DataTally instances only iterate completely over this assignment once, then recalculate statistics based on which vertices have moved (rather than recompute entirely). The keys in this mapping are the identifiers specified when the dual graph was created – typically, these are integers or GEOIDs. In the above example, both dual graphs' vertices are integer-indexed, so the VTD indices are a subset of the block indices. When the Election updaters are initialized, they internally create DataTally objects to tally up votes for the specified parties based on the current Partition's assignment. Because these Elections are initialized only once and the Partition is not iterated – i.e. the Partition's assignment isn't modified by flipping vertices from one district to another – the Election updaters are evaluated multiple times on different, similarly-indexed assignments. Consequently, we get conflicting/overlapping/missing district assignments, causing funky or impossible values like the ones above.

The most sensible solution is to deep-copy dictionaries of updaters when instantiating Partitions.

pizzimathy avatar Nov 30 '21 21:11 pizzimathy

We should throw a UserWarning when this occurs, I think (should be easy to detect).

InnovativeInventor avatar Dec 27 '21 18:12 InnovativeInventor

So throw a UserWarning and do not deep-copy dictionaries? Or do both?

pizzimathy avatar Jan 04 '22 00:01 pizzimathy

Both, I think.

InnovativeInventor avatar Jan 04 '22 01:01 InnovativeInventor

Love it – I'll put a PR and some tests in over the coming week and close this issue when I'm done.

pizzimathy avatar Jan 04 '22 01:01 pizzimathy

It looks like this issue has been resolved in #375, so I am going to close this issue. Please let me know if I am mistaken and we need to reopen it.

peterrrock2 avatar Jan 02 '24 18:01 peterrrock2