symfit icon indicating copy to clipboard operation
symfit copied to clipboard

Improve test_finite_difference and test_fit_result

Open antonykamp opened this issue 5 years ago • 46 comments

Hey guys,

I worked on test_finite_difference and test_fit_result so that they correspond more to the pytest style.

Some words to 'test_fit_result.py': It's not possible to use a fixture in Parameters, that's why I used a dict to store an call the results. I didn't touch the last test, because the converted version would be more confusing than helpful.

Merry Christmas and a Happy New Year

antonykamp avatar Dec 25 '19 10:12 antonykamp

The test test_constrained/test_constrained_dependent_on_matrixmodel didn't fail because of this PR. If I clone the master branch, the test doesn't work either.

antonykamp avatar Dec 25 '19 11:12 antonykamp

I could compress the fixtures to three: parameters a, b and the result_dict. In this case, all the results would be built in a single fixture, which returns a dict with all of them. The parameters a and b are used the test-method test_fitting.

By the way, I inquire about Fixtures with several return-items.

antonykamp avatar Dec 27 '19 20:12 antonykamp

I stick to the solution with the dictionary after long research :D

antonykamp avatar Dec 27 '19 21:12 antonykamp

I stick to the solution with the dictionary after long research :D

Could you share some of your considerations? That way we can have a short discussion about it here, without me also having to do the research.

pckroon avatar Dec 28 '19 14:12 pckroon

Just for clearness: You want to create a parametrized fixture. The params of the fixture are the different results of the models. Further, the fixture will be applied to different tests. But now it is difficult to create tests, with each model having a different comparison partner. In this case, there should be a new parallel parameterization. Besides, models often have to be left out (or only one model is tested) and in my eyes, you quickly lose the overview. The tests are not only made for TravisCI, but also for people who need to understand :D

In my eyes, the dictionary allows models to be added speedy, tests to be implemented easily and clearly, and bugs to be viewed quickly.

antonykamp avatar Dec 29 '19 10:12 antonykamp

Very clear, thanks for the write-up. To sum up: the tests are too diverse? Feel free to reorganise/change the tests so that they become more uniform, if you think that makes sense.

As for the tests in finite_difference, we'll need to find a way to use models in pytest.mark.parametrize without making a million different fixtures. (e.g. @pytest.mark.parametrize('model', [modelA, modelB, modelC, ...])). I don't quite know how to do this yet, and I'm hoping you have a good idea :) We'll need it for the rest of the tests anyway.

pckroon avatar Dec 29 '19 19:12 pckroon

On the one hand it's difficult to uniform the tests. For example, only two results are using constraints. We have to separate them. Also chained_result steps often out of line.

On the other hand the tests can be more uniformed than I initially thought. In some cases, a model didn't mention even though it would be passed.

EDIT: I thought a long time, that it's not possible to put fixtures in a parametrizations, because it was a feature request some years ago, but It didn't realise that an offshoot of Pytest can do it. Now I would have with pytest-cases some new possibilities, for example to replace the dict. However, it is not possible with pure pytest...

antonykamp avatar Dec 29 '19 22:12 antonykamp

I never said it would be easy :) For constraints: you might be able to parametrize them as [a==b] and use [] for no constraints. Or just None. ChainedMinimizers and their results are a bit magic, so I can see those becoming a separate case. And let's stick to pure pytest for now and see what we can cook up. It may be an idea to create an alphabet of parameters (a-e) / variables (x-z) in the global namespace, and use those to make models for parametrized tests?

pckroon avatar Dec 30 '19 15:12 pckroon

It's never easy ;-) Some ideas...

  • I saw that if a model was initiated with None constraints than it gets an empty list. We could use it, because the test, which checks the constraints, uses a for-each loop.
  • I just want to know if I understood it right: models with ``ChainedMinimizeruseLeastSquares `as objective, right?
  • It's not beautiful, but we could test the minimizers like this without leaving chained_result out.
assert isinstance(result_name.minimizer, result_minimizer)
if isinstance(result_name.minimizer, ChainedMinimizer):
    for minimizer, cls is zip(result_name.minimizer.minimizers, [BFGS, NelderMead]):
        assert isinstance(minimizer, cls)

(Maybe we could connect the assert statement with the if statement)

antonykamp avatar Dec 30 '19 17:12 antonykamp

  • I saw that if a model was initiated with None constraints than it gets an empty list. We could use it, because the test, which checks the constraints, uses a for-each loop.

Sure. You could even do a if constraints: ..., which will be False for both None and an empty list.

  • I just want to know if I understood it right: models with ChainedMinimizer use LeastSquares as objective, right?

Not necessarily. Maybe in the current tests, but it's not generally true.

  • It's not beautiful, but we could test the minimizers like this without leaving chained_result out.

I then prefer kicking all chained minimizer tests to either a separate test, or even a separate file. As you said earlier: making the tests understandable is at least as important.

pckroon avatar Dec 31 '19 13:12 pckroon

Allright....Then I have ean idea of the new test_fit_result file :)

But I won't work on it until next year :P

Guten Rutsch :)

antonykamp avatar Dec 31 '19 15:12 antonykamp

Hey, I'm so sorry I was away for so long. Papers, courses and exams stood between the pull request and me. Now I have no such big projects until October. Perhaps this conversion will be done by then (at least that's my goal).

Does it make sense to continue working on this PR, or have there been many new tests in the meantime?

antonykamp avatar Jun 14 '20 16:06 antonykamp

Heyhey,

it's been quiet here as well. If you want to pick this up again then by all means go ahead. Just to make sure, merge master first, but I don't expect any changes.

pckroon avatar Jun 15 '20 09:06 pckroon

Bevor I start, I want to sum up the requested changes:

test_finite_difference Parametrizing fixtures should be applied to the method test_model. The problem was the number of required fixtures for the variables and parameters. A solution could be an alphabet in conf.py.

test_fit_result The goal is one big method were the six models will be tested. Problems were the missing feature by pytest and diversity of the models.

So far correct?

antonykamp avatar Jun 15 '20 13:06 antonykamp

I think so, yes. Start with the simpler one (test_finite_difference) before diving into the other. Maybe something like this would work:

@pytest.mark.parametrize("model, expected_values, initial_values, xdata, noise", (
    [a*x**2 + b*x + c, {'a': 1, 'b': 2, 'c': 3}, {'a': 0, 'b': 0, 'c': 0}, {'x': np.arange(10)}, 0.05]

)
def test_something(model, expected_values, initial_values, xdata, noise):
    if not initial_values:
        initial_values = expected_values.copy()
    y_data = model(**xdata, **expected_values)
    y_data *= np.random.normal(loc=1, scale=noise, shape=y_data.shape)  # doesn't actually work, but you get the idea
    ... # more things here
    assert result == expected_values

pckroon avatar Jun 15 '20 13:06 pckroon

Alright! The next commit will contain the mentioned collection conf.py of variables and parameters + the integration into the individual test files.

antonykamp avatar Jun 15 '20 20:06 antonykamp

I'm still contemplating whether we can condense the rest of the tests in finite_difference into the one parameterised case as well, and whether or not that's a good idea. It'll result in a test method with a lot of arguments...

pckroon avatar Jun 22 '20 07:06 pckroon

In my opinion, test_ODE_stdev could be built into the big method. But the other methods test many more aspects than we can build them in.

Edit: Question: Why is the test that checks, if ODE models generate standard deviation, in test_finite_difference and not in test_ode? Otherwise, we could move this part to test_ode and additionally add an ODE_model to the big parameterization 🙆‍♂️

antonykamp avatar Jun 22 '20 10:06 antonykamp

By the way, I found an unconverted test. Should I convert it in this PR (after fit_result and finite_difference) or later?

antonykamp avatar Jun 22 '20 20:06 antonykamp

Edit: Question: Why is the test that checks, if ODE models generate standard deviation, in test_finite_difference and not in test_ode? Otherwise, we could move this part to test_ode and additionally add an ODE_model to the big parameterization 🙆‍♂️

The original reason is that ODE models require the finite difference methods to be able to calculate a std deviation (because they're not analytically differentiable). I'm ok with reorganising this though, but it would be good to test that we can calculate the Jacobian of an ODEModel, and possibly the stddev.

By the way, I found an unconverted test. Should I convert it in this PR (after fit_result and finite_difference) or later?

Leave it for later. Keep PRs small.

pckroon avatar Jun 23 '20 11:06 pckroon

Leave it for later. Keep PRs small.

👍

it would be good to test that we can calculate the Jacobian of an ODEModel, and possibly the stddev.

I added an ODEModel to the parameterization. Checking stddev would require to evaluate fit_result of every model. Then we again have the problem that we can hardly separate the datat-arrays from the parameters. As long as we don't have a nicer variant for the arrays, we can hardly test stddev.

antonykamp avatar Jun 23 '20 15:06 antonykamp

I'm struggling with the failed check by TravisCI. The test passes locally, but not in CI. In case of c++ I would have put it on the compiler, but in python?

antonykamp avatar Jun 26 '20 08:06 antonykamp

(sf.Model({y: 3 * a * x**2 + b * x * w - c, z: sf.exp(a*x - b) + c*w}), {'a': 3.5, 'b': 2, 'c': 5, 'w': np.arange(10)/10}), is failing. It looks like a numerical issue: https://travis-ci.org/github/tBuLi/symfit/jobs/702160446#L380 You can either change the values of the parameters/variables to be in a region that is numerically more stable, or relax the tolerance a bit.

pckroon avatar Jun 26 '20 09:06 pckroon

I am currently working on test_fit_result. I noticed some small things:

  1. The results (parameters a, b + r²) differ strongly If I calculate them with a normal Fit, a Fit with likelihood, or constraints (surprise). I could calculate them with WolframAlpha (or other) and implement the solutions (?)

  2. likelihood has no r^2 understandably. I would check that with hasattr in the test:

if hasattr(fit_result, 'r_squared'):
    assert isinstance(fit_result.r_squared, float)
    ...
  1. I had asked before if ChainedMinimizer always uses LeastSquare as a minimizer. Even if it is not by default, should I still include it in the test?

  2. Currently we only test fit_result, minpack_result and likelihood for the existence of r^2, objective_result,... I would add the remaining Fits so that everything fits into one test.

antonykamp avatar Jun 30 '20 19:06 antonykamp

I expect test_fit_results to require much more invasive/drastic changes than test_finite_difference.

  1. I think this is expected (since they're different problems), but which test are you referring to exactly?
  2. See also 4.
  3. ChainedMinimizer is just another minimizer, and can be used in combination with any other Objective (such as LeastSquares). ChainedMinimizers are a little funny though, since it runs multiple minimizers in order which affects the produced fit_result.
  4. Great idea. Combined with 2, maybe you can parametrize the tests with a dict of expected attribute/value pairs, and use getattr (for attr_name, value in attributes.items(): assert getattr(attr_name) == pytest.approx(value))

pckroon avatar Jul 01 '20 08:07 pckroon

I uploaded the first draft to talk with examples (with described failures and without comments) :D

  1. Because we use the constraint Eq(a, b) in two Fits the results don't fit perfectly to the datasets. And likelihood is likelihood :P
  2. & 4. A dict for the fit results would be helpful too I guess. But I like the idea of getattr!
  3. So ChainedMinimizer is the special kid in the family of Minimizers ;)

In every case, we should calculate the results with WolframAlpha to get a, b,r^2,....

antonykamp avatar Jul 01 '20 09:07 antonykamp

Hi guys, thanks for keeping up the good work, I've been a bit swamped with work regarding the last stages of my PhD so I've struggled to find time to look into this. I trust @pckroon's verdict on this.

Be sure to remove the German comments from the code before merging though, das ist ja wahnsinn! :P

p.s. I'm loving the reduction in lines of code, is this achieved without loss of functionality?

tBuLi avatar Jul 01 '20 13:07 tBuLi

So ChainedMinimizer is the special kid in the family of Minimizers ;)

Definitely.

In every case, we should calculate the results with WolframAlpha to get a, b,r^2,....

The values as they are in the tests now should be correct/validated somehow. But feel free to double check them.

pckroon avatar Jul 01 '20 13:07 pckroon

p.s. I'm loving the reduction in lines of code, is this achieved without loss of functionality?

Jup. That's the magic of pytest parametrization and one of the reasons why I've been pushing for this. In addition, it helps check that you cover all combinations.

pckroon avatar Jul 01 '20 13:07 pckroon

Be sure to remove the German comments from the code before merging though, das ist ja wahnsinn! :P

Of course :D I pushed a draft including my todos :)

antonykamp avatar Jul 01 '20 14:07 antonykamp