Data table doesn't match field documentation
The fields documented in the README don’t appear to match those in the 2016 General data file. Here are the first few documented fields:
| Field Description | Length | Data Type |
|---|---|---|
| Election Year | 4 | Numeric |
| Election Type (G = General) | 1 | Character |
| County Code * | 2 | Numeric |
| Precinct Code | 7 | Numeric |
| Candidate Office Rank | 2 | Numeric |
| Candidate District | 3 | Numeric |
And these are the first few lines:
"2016","G",1,10,1,0,1,1,"USP","DEM",2016C0483,"CLINTON","HILLARY","","",119,4,33,193,6,"ABBOTTSTOWN","","","","",0,5,001,0010,0,19,33,193
"2016","G",1,20,1,0,1,1,"USP","DEM",2016C0483,"CLINTON","HILLARY","","",143,4,33,193,6,"ARENDTSVILLE","","","","",0,10,001,0020,0,19,33,91
"2016","G",1,30,1,0,1,1,"USP","DEM",2016C0483,"CLINTON","HILLARY","","",83,4,33,193,6,"BENDERSVILLE","","","","",0,15,001,0030,0,19,33,193
"2016","G",1,40,1,0,1,1,"USP","DEM",2016C0483,"CLINTON","HILLARY","","",257,4,33,193,4,"BERWICK","","","","",0,20,001,0040,0,19,33,193
"2016","G",1,50,1,0,1,1,"USP","DEM",2016C0483,"CLINTON","HILLARY","","",148,4,33,193,6,"BIGLERVILLE","","","","",0,25,001,0050,0,19,33,193
The lengths and positions don’t match, and the file is missing a header row (as in other OE data sets) to disambiguate.
I think you can trust the order of the field descriptions and ignore length and data type. Note that the third column includes numbers 1 through 67 and there are 67 counties in PA [Ballotpedia]. Things seem to check out if you sniff around some of the other columns too.
It looks like the table in the README is just copied from the communication provided by the PA Department of State. My hypotheses are either:
- The original description provided by PA was off.
- Some leading zeros got dropped when OE converted the PA-provided files.
I'll ask PA about whether the record layout changed for 2016 files and will provide a response to this.
Is this now resolved?
We still need to do 2016, but this issue doesn't occur for 2018 general.