openelections-core icon indicating copy to clipboard operation
openelections-core copied to clipboard

Potential broken approach of reading csv files

Open warwickmm opened this issue 4 years ago • 8 comments

The following pattern is used often to read csv files:

with open(filename, 'rU') as csvfile:
    reader = unicodecsv.DictReader(csvfile)

I think this worked in python2 since the str and bytes types were synonymous. However, this breaks in python3 since unicodecsv expects the file to be opened in binary mode, which it is not.

For example, the following fails in python3 with the error AttributeError: 'str' object has no attribute 'decode'

import unicodecsv

filename = 'openelex/us/md/mappings/md.csv'
with open(filename, "r") as data:
    reader = unicodecsv.DictReader(data)
    for row in reader:
        print(row)

Using csv instead of unicodecsv fixes the issue.

import csv

filename = 'openelex/us/md/mappings/md.csv'
with open(filename, "r") as data:
    reader = csv.DictReader(data)
    for row in reader:
        print(row)

Is there something wrong with my setup, or is this broken for other people as well?

warwickmm avatar Apr 03 '21 22:04 warwickmm

FYI, by using csv instead of unicodecsv together with one other small fix, I can get most of the failing tests in test_md_datasource.py to pass. However, I'm not sure if anything else breaks as a result. But given my understanding of how unicodecsv works with python2 vs. python3, it's a bit unclear to me how things are currently working.

warwickmm avatar Apr 03 '21 23:04 warwickmm

This seems related to https://github.com/jdunck/python-unicodecsv/issues/65.

warwickmm avatar Apr 03 '21 23:04 warwickmm

@warwickmm yeah, this is an artifact of using python2, but we should be using python3, so we can remove unicodecsv and just replace it with the csv module.

dwillis avatar Apr 04 '21 01:04 dwillis

Ok. Do you mind my asking how any of this is working currently? It would seem to me that none of the csv files can be read properly as-is.

warwickmm avatar Apr 04 '21 01:04 warwickmm

@warwickmm it's a fair question, and the basic answer is that we've mostly not used the core repo in recent times, instead prioritizing the data conversion work that results in the openelections-data-{state} repos. But we do use it for some of the states and use Python 3 for that.

dwillis avatar Apr 04 '21 01:04 dwillis

Thanks. If the core repo isn't used very much anymore, is there a different repo that I can look at for possible ways to contribute? Or, is the core repo still deserving of attention?

warwickmm avatar Apr 04 '21 03:04 warwickmm

@warwickmm most of our work now is done in various state-specific repos, where we put converted precinct results. For example, we're working on converting official precinct results for Texas here.

dwillis avatar Apr 04 '21 13:04 dwillis

Thank you. I'll take a look at the state-specific repos.

warwickmm avatar Apr 04 '21 16:04 warwickmm