OG-Core icon indicating copy to clipboard operation
OG-Core copied to clipboard

Need better solution for cached testing files

Open rickecon opened this issue 4 years ago • 3 comments

OG-USA's testing suite is located primarily in the ./ogusa/tests/ directory. However, other tests can be found in the ./regression/ and ./cs-config/cs_config/tests/ directories as indicated in the pytest.ini file.

Many of the reduced-run tests that run automatically on TravisCI use cached .pkl files located in the ./ogusa/tests/ and in the .//ogusa/tests/test_io_data/ directories. Furthermore, when all of the tests are run, which takes over 12 hours, those tests use .pkl and .csv files that are not checked in to the OG-USA repository.

We need a better solution for this. Some proposals are the following.

  • Consolidate the test in the ./regression/ directory into the ./ogusa/tests/ directory and simply give the regression tests the full_run marker.
  • Create a separate public repository for the data used in testing and have the tests download that data using the requests library or using some kind of package import.

cc: @jdebacker

rickecon avatar Oct 22 '20 18:10 rickecon

Feels unlucky that this is Issue #666.

rickecon avatar Oct 22 '20 18:10 rickecon

@rickecon Thanks for opening this issue. I think there are several things going on and I want to try to separate them. I'd break them out as:

  1. Issues with testing when some data files are large. This is the topic of (open) Issue #623.
  2. Issues with keeping binary files in the repo, which could pose potential security risks and are difficult to view diffs in. This is the topic of (closed) Issue #368.
  3. What to do with the regression tests? Which includes: a. Do we add an additional 33 pickle files to the repo that represent the cached results currently used in the unit tests or do we try to consolidate to a smaller number of files/tests? b. If we are going to add additional files to the repo for unit tests, do we try to use csv for other file formats rather than pickle files? This may be the same or different than what we decide in general for (2). c. Do we change the location and when regression tests are run?

I'd like to focus this issue specifically on data for regression testing, items 3a and 3b. I'd suggest discussing (1) and (2) in their respective issues. I also opened a new issue, Issue #667, to discuss 3c, which has relates to issues that are independent of the format of cached test files.

jdebacker avatar Oct 22 '20 19:10 jdebacker

Focusing on cached files for regression tests, I have the following thoughts:

  • Rather than the pickle files with dictionaries of SS and TPI data, we can store these in csv files. These can be printed with the functions like the ogusa.output_tables.tp_output_dump_table function.
  • We can test just time path outputs since the last periods of the TP are the same as the SS, no need to test the SS outputs independently.
  • We might be able to consolidate the number of reforms since we don't need to do multiple tests of tests policies that change the same (or similar) parameters. On the other hand, no current parameterizations in the regression tests test any changes in non-Tax-Calculator parameters, so we may want to add additional tests.

jdebacker avatar Oct 22 '20 19:10 jdebacker

We've moved the cached testing files out of the OG-Core/ogcore directory, which means that they do not impact the size of the ogcore package (e.g., when downloaded from PyPI). This has proven to largely satisfy the issues noted here.

jdebacker avatar Oct 18 '23 17:10 jdebacker