warwickmm
warwickmm
I spot checked a few of these: * `2018/counties/20180306__tx__primary__san_saba__precinct.csv` * `2018/counties/20180306__tx__primary__jefferson__precinct.csv` * `2018/20180306__tx__primary__henderson__precinct.csv` and the duplicate entries were present in the initial commit of the files. Several of them have...
If the value for the `party` is different, then it's not counted as a duplicate. I'm basically hashing the rows and looking for duplicate hashes. Below are the NY errors:...
Looking into this example ``` 2018/counties/20181106__ny__general__onondaga__precinct.csv * 460 duplicate rows detected: Row 3420: ['Onondaga', 'Geddes 20', 'Governor', '', 'Write-ins', '', '0'] Row 3432: ['Onondaga', 'Geddes 20', 'Governor', '', 'Write-ins', '',...
I looked into one of the CA duplicates: ``` 2020/20201103__ca__general__glenn__precinct.csv * 2 duplicate rows detected: Row 361: ['Glenn', '50061', 'President', '', 'PFP', 'Gloria LaRiva', '0'] Row 364: ['Glenn', '50061', 'President',...
Given the above examples, I think it makes sense to start testing for duplicate entries in all the repos. I don't think we should simply remove the duplicates, as some...
No problem. Can you create a new repo named `openelections-data-tests` to house these new data tests? I can then start testing the code.
Thanks @dwillis. Would it be possible to get write access to the new repo?
The [data tests](https://github.com/openelections/openelections-data-tx/actions/workflows/data_tests.yml) and [format tests](https://github.com/openelections/openelections-data-tx/actions/workflows/format_tests.yml) workflows are an attempt at this.
Is there another source for this, or is this something we would want to reach out to the county for clarification?
No problem. The vote breakdown totals test is turning out to be more useful than I anticipated.