OG-Core
OG-Core copied to clipboard
Need better solution for cached testing files
OG-USA's testing suite is located primarily in the ./ogusa/tests/
directory. However, other tests can be found in the ./regression/
and ./cs-config/cs_config/tests/
directories as indicated in the pytest.ini
file.
Many of the reduced-run tests that run automatically on TravisCI use cached .pkl
files located in the ./ogusa/tests/
and in the .//ogusa/tests/test_io_data/
directories. Furthermore, when all of the tests are run, which takes over 12 hours, those tests use .pkl
and .csv
files that are not checked in to the OG-USA repository.
We need a better solution for this. Some proposals are the following.
- Consolidate the test in the
./regression/
directory into the./ogusa/tests/
directory and simply give the regression tests thefull_run
marker. - Create a separate public repository for the data used in testing and have the tests download that data using the
requests
library or using some kind of package import.
cc: @jdebacker
Feels unlucky that this is Issue #666.
@rickecon Thanks for opening this issue. I think there are several things going on and I want to try to separate them. I'd break them out as:
- Issues with testing when some data files are large. This is the topic of (open) Issue #623.
- Issues with keeping binary files in the repo, which could pose potential security risks and are difficult to view diffs in. This is the topic of (closed) Issue #368.
- What to do with the regression tests? Which includes: a. Do we add an additional 33 pickle files to the repo that represent the cached results currently used in the unit tests or do we try to consolidate to a smaller number of files/tests? b. If we are going to add additional files to the repo for unit tests, do we try to use csv for other file formats rather than pickle files? This may be the same or different than what we decide in general for (2). c. Do we change the location and when regression tests are run?
I'd like to focus this issue specifically on data for regression testing, items 3a and 3b. I'd suggest discussing (1) and (2) in their respective issues. I also opened a new issue, Issue #667, to discuss 3c, which has relates to issues that are independent of the format of cached test files.
Focusing on cached files for regression tests, I have the following thoughts:
- Rather than the pickle files with dictionaries of SS and TPI data, we can store these in csv files. These can be printed with the functions like the
ogusa.output_tables.tp_output_dump_table
function. - We can test just time path outputs since the last periods of the TP are the same as the SS, no need to test the SS outputs independently.
- We might be able to consolidate the number of reforms since we don't need to do multiple tests of tests policies that change the same (or similar) parameters. On the other hand, no current parameterizations in the regression tests test any changes in non-Tax-Calculator parameters, so we may want to add additional tests.
We've moved the cached testing files out of the OG-Core/ogcore
directory, which means that they do not impact the size of the ogcore
package (e.g., when downloaded from PyPI). This has proven to largely satisfy the issues noted here.