cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Simplify and standardize tests that compare against pandas

Open vyasr opened this issue 3 years ago • 8 comments

A large number of cudf's current tests are about twice as long as they need to be because they contain two copies of the same set of commands, once for cudf and once for pandas. Aside from the fact that this unnecessarily doubles the number of lines of code, it also introduces additional room for divergence. For instance, different tests may be initialized with cudf or pandas objects, potentially leading to a doubling of parametrization or fixtures. Furthermore, it forces developers to read the entire function to find any places where there are expected pandas/cudf divergences that we have to work around.

This PR introduces a new decorator that can be used for tests that perform a pandas/cudf comparison. These tests must accept a parameter lib indicating whether to run with cudf or pandas. The test will then be called twice, once with cudf and once with pandas. It should return results that must be compared. The final assertion may be customized in case we need to do something other than simply assert_eq. The rest of the parameters to the test function can be anything normally passed to a pytest test; fixtures and parameters are forwarded transparently to the invocations of the function.

vyasr avatar Jul 01 '22 21:07 vyasr

As part of this PR, I would also like to address the possibility of additional forms of verification. One particularly important one is ensuring that cudf correctly mimics pandas view/copy semantics. We currently have numerous APIs that do not do this, see e.g. #7374 and #7487. We should make it possible to test that modifying the outputs of commands has the expected effect on the inputs, as well.

vyasr avatar Jul 19 '22 02:07 vyasr

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Aug 18 '22 02:08 github-actions[bot]

Codecov Report

Base: 88.11% // Head: 88.11% // No change to project coverage :thumbsup:

Coverage data is based on head (9f8b936) compared to base (9f8b936). Patch has no changes to coverable lines.

:exclamation: Current head 9f8b936 differs from pull request most recent head 359aa69. Consider uploading reports for the commit 359aa69 to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-22.12   #11191   +/-   ##
=============================================
  Coverage         88.11%   88.11%           
=============================================
  Files               133      133           
  Lines             21881    21881           
=============================================
  Hits              19281    19281           
  Misses             2600     2600           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Aug 30 '22 00:08 codecov[bot]

Note that the test that's currently failing is expected to fail (see #7374).

vyasr avatar Sep 08 '22 19:09 vyasr

These tests must accept a parameter lib indicating whether to run with cudf or pandas. The test will then be called twice, once with cudf and once with pandas.

This confuses me - why specify cudf or pandas if it will be called once with each regardless?

thomcom avatar Sep 08 '22 20:09 thomcom

This is pretty cool! Some parts feel a bit overengineered and break from some pytest writing conventions specifically through the use of "assert" injection. From the test logic alone and no knowledge of the utilities and variable injections taking place behind the scenes, it is a bit tough to recognize the patterns in place and see how to debug/write new tests. (This is a hard part about learning pytest fixtures for injecting data as well, but there are extensive docs to help teach that concept by example.) If we commit to this approach, we can probably find ways to improve the learnability of this style of test writing through clear naming / examples in docs.

Currently it's easy to insert breakpoint() and play with gdf, pdf objects in the test of interest, and using this kind of decorator breaks that. Additionally, I really like that (despite the obvious repetition) it is extremely clear what goes wrong in a test because our tests currently follow the Arrange-Act-Assert pattern. https://automationpanda.com/2020/07/07/arrange-act-assert-a-pattern-for-writing-good-tests/

I agree with the intentions of this PR, and agree that there is room for improvement in the repeated logic and that we need to think about testing copy/view semantics more carefully. If there is some middle ground, I would be satisfied with a few helper functions that eliminated some boilerplate if that can accomplish a similar goal as these decorators, while still adhering somewhat close to Arrange-Act-Assert in the test definitions (rather than requiring "Arrange" to be performed by a fixture, only defining the "Act" function in the test body, and injecting a custom "Assert" function in special cases and assuming a common "Assert" in other cases).

For example, I'd be happy with:

@pytest.mark.parametrize("values", [...])
def test_concat(values):
    def concat(lib):
        return lib.concat([lib.Series(values), lib.Series(values)])
    ps, gs = arrange_and_act(concat)
    assert_eq(ps, gs)

It's easy to breakpoint(), keeps the arrange/act phases separate from assertions, and reduces LOC by a few.

bdice avatar Sep 08 '22 20:09 bdice

These tests must accept a parameter lib indicating whether to run with cudf or pandas. The test will then be called twice, once with cudf and once with pandas.

This confuses me - why specify cudf or pandas if it will be called once with each regardless?

I believe lib is an auto parameter from pytest that will be parameterized to be either cudf or pandas. Not user defined.

isVoid avatar Sep 08 '22 21:09 isVoid

@thomcom: @isVoid is correct, the lib parameter is automatically provided to any test with the decorator, it's just the convention I've established for these tests to receive the parameter. The tests should be written so that they work whether lib is cudf or pandas (that's how it verifies API compatibility).

@bdice those are good points. My responses:

  • I agree that some of it feels overengineered. For instance, if you can think of a way to implement this with a simpler decorator that doesn't require exec I would be very happy, this was just the easiest way to get all the argument/parameter handling working. What other aspects feel too complex that you would like to see simplified?
  • While it can be helpful to insert breakpoints, the majority of the time you can get the bulk of the benefit from just using pytest --pdb. With the way this decorator is designed, there isn't any gap between when the objects are created and the assertion, so if you fail before the assertion you should be able to just insert breakpoints into your own code and otherwise --pdb should get you to a point of inspecting pdf/gdf. Do you think there are many cases where these two options are insufficient?
  • I think one of the most important aspects of this new approach is to make adding special code for handling cudf vs pandas feel like an antipattern. The current approach makes it feel pretty natural to do specialized postprocessing for the pandas or cudf outputs, which is something that we should discourage more strongly to force ourselves to adhere more strictly to pandas compatibility except in cases where we've made a design decision to behave differently. Breaking it up into more helper functions makes that less obvious since you could take the returned values and operate on them, which IMO is a weakness of that approach (that of course trades off with being more explicit). My main point here is that while reducing LOC is important, I think it's also important to encourage writing tests in a way that intrinsically encourages maximizing pandas compatibility.
  • I had a hard time coming up with a reasonable way to implement the copy/view semantics decorator, and that part feels the most overengineered to me. Do you agree? If so, do you have any thoughts on how you might improve that, or an alternative approach to propose there?

vyasr avatar Sep 09 '22 17:09 vyasr

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Oct 09 '22 18:10 github-actions[bot]

I'm going to close this for now. The advent of cudf.pandas means that we can largely rely on using the pandas unit test suite itself for compatibility. Our best path forward is probably going to be to contribute more tests upstream for cases where we're primarily validating directly against pandas. Also, as pylibcudf matures we'll be focusing more on lower-level testing that should help us clear a lot of cruft from our tests, at which point we can potentially reevaluate this PR.

vyasr avatar Dec 19 '23 00:12 vyasr