gerrymandertests
gerrymandertests copied to clipboard
First Notebook example doesn't work: apparently expects a state data file to already be there?
I'm trying to run the gerrymandertests, but apparently it relies on my separately downloading state-specific files (I'm particularly interested in New Mexico) and I can't find any documentation on where to get them.
If I just run the notebook, here's the error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-54dcfe840d25> in <module>
41
42 for chamber in chambers:
---> 43 chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
44 chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
45 chambers[chamber]['elections_df'],
~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
12 '''
13
---> 14 df = pd.read_csv(input_filepath)
15
16 df = df[df['Year'] >= start_year]
~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
674 )
675
--> 676 return _read(filepath_or_buffer, kwds)
677
678 parser_f.__name__ = name
~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
446
447 # Create the parser.
--> 448 parser = TextFileReader(fp_or_buf, **kwds)
449
450 if chunksize or iterator:
~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
878 self.options["has_index_names"] = kwds["has_index_names"]
879
--> 880 self._make_engine(self.engine)
881
882 def close(self):
~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1112 def _make_engine(self, engine="c"):
1113 if engine == "c":
-> 1114 self._engine = CParserWrapper(self.f, **self.options)
1115 else:
1116 if engine == "python":
~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1889 kwds["usecols"] = self.usecols
1890
-> 1891 self._reader = parsers.TextReader(src, **kwds)
1892 self.unnamed_cols = self._reader.unnamed_cols
1893
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] File election_data/state_legislative/state_legislative_election_results_post1971.csv does not exist: 'election_data/state_legislative/state_legislative_election_results_post1971.csv'
election_data/congressional_election_results_post1948.csv comes as part of the repository, but election_data/state_legislative/ is an empty directory. Where can I get the files that it expected there?
In NM we're actively fighting for better redistricting (I'm webmaster for fairdistrictsnm.org) and I'd love to get some quantitative measurements I could show to legislators and display on the website.
Hi @akkana, the file you're looking for is here: https://github.com/PrincetonUniversity/historic_state_legislative_election_results/blob/2bf28f2ac1a74636b09dfb700eef08a4324d2650/state_legislative_election_results_post1971.csv
I'll update the notebook to update the file path to this data set!
Thanks! I downloaded that and put it in election_data/state_legislative and got past that error. Now it's dying with a different error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-54dcfe840d25> in <module>
41
42 for chamber in chambers:
---> 43 chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
44 chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
45 chambers[chamber]['elections_df'],
~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
34 new['District Numbers'] = grouped['District'].apply(list)
35
---> 36 if df.columns.contains('Dem Votes'):
37 new['Weighted Voteshare'] = grouped['Dem Votes'].apply(sum) / (grouped['Dem Votes'].apply(sum) +
38 grouped['GOP Votes'].apply(sum))
AttributeError: 'Index' object has no attribute 'contains'
I realized that was with the pip install gerrymetrics; but I tried pip uninstall gerrymetrics followed by pip install . from the checked-out code, and got the same error. If it matters, this virtualenv's pandas reports version 1.0.1 (Python version 3.7.5).
Hi @akkana,
I tried to reproduce your issue but was not able to do so. I created a virtual environment (python version 3.7.4) and successfully installed gerrymetrics just now. I wonder if your issue is coming up because your version of pandas does not agree with the version of pandas automatically installed by this package.
What I recommend is that you create a virtual environment, and before installing any other packages, install gerrymetrics with the following code:
python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics
Let me know if that works, thanks so much!
I get exactly the same error as before when I type those three lines followed by jupyter-notebook run_gerrymandering_metrics.ipynb I tried it outside of jupyter-notebook and got the same error, still AttributeError: 'Index' object has no attribute 'contains'
If I edit utils.py and put double underscores at eiither end of the "contains" in the line that's erroring (I can't illustrate that because apparently double underscores have a meaning in markdown) in parse_results(), I get a little farther and it even appears to download something (some data?), but then it dies with
File "<stdin>", line 6, in <module>
File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 66, in tests_df
df = yearstatedf()
File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 55, in yearstatedf
names=['Year', 'State'])
TypeError: __new__() got an unexpected keyword argument 'labels'
(I should warn you my utils.py line numbers will be a little off because I've inserted some print()s). And that does look like a Pandas difference, since the line with the error is creating a pd.MultiIndex with labels as a keyword arg.
This is Python 3.7.5 on Ubuntu 19.10, so probably the pandas the virtualenv is pulling in is tied to that. pandas double-underscore version is 1.0.3.
@akkana I just pushed some code that updates the pandas syntax and data path. Will you try cloning again with the updated code and run in a virtual environment with:
python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics
jupyter-notebook run_gerrymandering_metrics.ipynb
Thanks so much!
Sorry for the delay, I've been super busy with election stuff.
Following those instructions (after git pull in the gerrymandertests repo) gives this mysterious error:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-4-9649b5edd3ef> in <module>
----> 1 import gerrymetrics as g
2 import IPython.display as ipd
3
4 from collections import defaultdict
5
~/outsrc/gerrymandertests/gerrymetrics/__init__.py in <module>
----> 1 from .metrics import *
2 from .plots import *
3 from .utils import *
~/outsrc/gerrymandertests/gerrymetrics/metrics.py in <module>
11 from __future__ import division # for python 2
12 import numpy as np
---> 13 import scipy.stats as sps
14
15
ModuleNotFoundError: No module named 'scipy'
It's mysterious because clearly scipy is there; if I run python inside the venv and run import scipy.stats as sps, it works fine. But it doesn't work inside the notebook.
Aha: that's because Ubuntu's jupyter-notebook begins with: #!/usr/bin/python3. So I ran a pip install jupyterlab, then ran install_ve/bin/jupyter-notebook run_gerrymandering_metrics.ipynb That gets me past the import error and now it dies with:
Traceback (most recent call last):
File "/home/akkana/outsrc/gerrymandertests/install_ve/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-9649b5edd3ef>", line 1, in <module>
import gerrymetrics as g
File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/__init__.py", line 3, in <module>
from .utils import *
File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 37
if 'Dem Votes' in df.columns:
^
IndentationError: unexpected indent
Sure enough, that line is indented more than the lines before it. If I fix the indentation, I get a little farther:
TypeError Traceback (most recent call last)
<ipython-input-1-9649b5edd3ef> in <module>
39 print(chamber)
40 chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
---> 41 chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
42 chambers[chamber]['elections_df'],
43 impute_val=impute_val,
~/outsrc/gerrymandertests/gerrymetrics/utils.py in tests_df(tests_dict)
63 '''
64
---> 65 df = yearstatedf()
66
67 for year in tests_dict:
~/outsrc/gerrymandertests/gerrymetrics/utils.py in yearstatedf()
50 '''
51
---> 52 index = pd.MultiIndex(levels=[[], []],
53 labels=[[], []],
54 names=['Year', 'State'])
TypeError: __new__() got an unexpected keyword argument 'labels'
so alas, now I'm just back to the error from two weeks ago.