Freeze / Infinite loop in ComparisonAuditController.java
When a contest has a discrepancy, and requires a second round of auditing, and the random selection has multiple duplicates, it is possible for the system to enter an infinite loop during the audit.
This would happen at the end of the second round, because the system thinks, incorrectly, that there is still more auditing to do. It causes the web interface to be unresponsive, and freeze, while the county is waiting for an (unnecessary) third round to launch. The loop would be in ComparisonAuditController.java at while (sorted_deduplicated_new_cvrs.isEmpty()) {
https://github.com/FreeAndFair/ColoradoRLA/blob/430c0d7bd73f9f76522c7d061d535fc41707c559/server/eclipse-project/src/main/java/us/freeandfair/corla/controller/ComparisonAuditController.java#L541-L564
The fix is to delete the line
cvrai.setCounted(cvrai.counted() + multiplicity);
in ComparisonAuditController.java as shown in https://github.com/democracyworks/ColoradoRLA/pull/17/commits/ec87ea6b69fecee549a4ec4fc688cea7cc638856
See a more detailed explanation, and the fix in context, in the DemocracyWorks fork: https://github.com/democracyworks/ColoradoRLA/pull/17
That's an interesting one, and I'm amazed we didn't run into it during all of our testing for Colorado. I'll integrate a fix when I get a chance; I'm not certain this is the right fix, because what should be happening is that at the same time the multiplicity is added, the actual occurrences of that ballot should be removed from the list (which is supposed to be "deduplicated"), so it should not come up again during the audit and should not be counted again. The fact that that isn't happening is confusing to me, and I'd be interested in actually watching what happens during a run that exhibits the bug.
In fact, the code that is supposed to remove those CVRs from the list to be audited in the next round is in lines 555-562 above:
for (final Round round : the_cdb.rounds()) {
for (final Long cvr_id : round.ballotSequence()) {
if (unique_new_cvr_ids.contains(cvr_id)) {
unique_new_cvr_ids.remove(cvr_id);
sorted_deduplicated_new_cvrs.remove(Persistence.getByID(cvr_id,
CastVoteRecord.class));
}
}
}
If that's not actually working to remove those previously-audited CVRs, it's indicative of something else going wrong somewhere; that might have also been going wrong for us in our version for Colorado, in which case we got very, very lucky including in some ridiculous testing that resulted in audits of the entire CVR set; or, it has started going wrong more recently, due to some other change elsewhere in the system that is causing Persistence.getByID() (or equivalence comparisons among CastVoteRecord instances) to not work properly in this case.
We can reproduce the bug using a CVR file that I can't share. I haven't had time to produce a public test case on my own, but I think these parameters are the salient ones.
Number of CVRs: 3221 Margin: 233 votes, 7.23378% Discrepancies: only one two-vote overstatement in the first round
Assuming the CVR file contains a vote for the declared winner of the first contest in the first CVR to be sampled (in selection order), and the loser is named "Loser", this command line runs the audit, with an audit board interpretation of a vote for the loser on the first sampled ballot.
main.py -d 10 -r 0.05 -s 87642966857752123362 -p "0 200" -l Loser -f CVR-3221.csv
In the process it generates a sample size of 87 in the first round and 184 in the second round, which included a total of 7 duplicated selections. At the end of the 2nd round, it thinks there is more to do and spins in the loop.
For clarity: does the example you just mentioned reproduce the problem on their code, or on our code?
The bug shows up in ColoradoRLA v1.1.0.3
OK.
Is there any way for you to anonymize the CVR file you’re using to cause the bug (global search and replace, etc.) and attach it here? Otherwise I’ll basically be shooting in the dark trying to replicate it for myself, which I’d like to avoid if at all possible.
I've uploaded a CVR that reproduces this bug on the current master (commit fbbc9aba) as part of the archive freeze.zip.
Running off it via a command ala that noted above
main.py -d 10 -r 0.05 -s 87642966857752123362 -p "0 200" -l Loser -f freeze-CVR.csv
produces the detailed output in freeze.sm and the server logs in server.stdout.
after noting that java is in an infinite loop, killing it and restarting it, and running this:
rla_export -r -e freeze-export
we get the export database data provided in the freeze-issue-916-export.zip file. These other files are also in the zip above.