DrWatson.jl Testing for scientifc projects

As the conversation in https://github.com/JuliaDynamics/DrWatson.jl/issues/89 was rather emotional I open a new issue for this.

I get, and agree, that the output of PkgTemplates.jl is too much for DrWatson. But shouldn't a scientific project have unit tests for the general functionality in src/ as well? IMHO this is not related to packaging things up but just a good programming practice.

Looking forward to hear your opinion.

Feb 19 '20 16:02 carstenbauer

There is also #67 where we had a discussion about that. The issue also contains the steps needed for running the tests.

In principle I agree with you that scientific projects also need testing, however I mostly restrict the tests to actual packages. My src folder usually contains packages which have their own test folder. I like that more because it's simpler to move the packages from the DrWatson project in case I need the functionality somewhere else. If I encounter a case that's not covered by the tests I try to simplify it and then move it to the package in src. I also find it quite difficult to follow a TDD for scientific projects, at least in my field.

There was also #90 and given the number of exclamation marks in the commit message I doubt that tests will come back ;)

Feb 19 '20 17:02 sebastianpech

I see, fair enough. I didn't see those PRs. Let's close this then.

Feb 19 '20 17:02 carstenbauer

Hah, I have a really unpopular opinion about this apparently!

But shouldn't a scientific project have unit tests for the general functionality in src/ as well?

Not really. I don't write "tests" for my science projects and not a single scientist I have ever interacted with so far did either. You two are a pleasant surprise I must admit, I didn't expect anyone to be doing this.

For me, the reason to not write tests is that my project evolves through time so much and changes constantly: I'd much rather spent the time re-writting tests doing new science instead. I feel like a scientific project is too specific in its scope for one to bother with tests: who do they help? They don't help the author because the author has checked everything by "plotting and seeing if it makes sense".

If I develop a consice and self-contained enough functionality, then I'll make it into a package anyway. I mean on GitHub, not just on my src folder. There I write tests as much as possible. But I find no reason to test my scripts and stuff... Of course this is how I do it. It doesn't mean it is correct or anything... I just think it is not time well spent writing tests for a scientific project.

IMHO this is not related to packaging things up but just a good programming practice.

That is very much true, but the target of DrWatson is good scientific practice which is orthogonal to good programming practice. Having a git commit in your data and knowing at which point of the code do they exactly come from is enough to establish reproducibility, without caring whether you have "pretty" code or not.

Be aware that most scientists have a programming skill level that is very near 0. For many, "unit tests" and "CI" are unknown words. That is why I am always reluctant to add such concepts here. I'd much rather that such concepts come from pacakges that target programming practices. The goal of DrWatson has become clearer and clearer through the months, and targets scientists in general.

Feb 19 '20 18:02 Datseris

I’d also like to add that we should differentiate between unit tests and higher level tests.

I write unit tests for my packages but for scientific projects I do not see any benefit. Every functionality I use in my scientific projects is extracted to separate projects with their own test suite. Most of my DrWatson projects only contain steering scripts to produce and reproduce plots and results. For these however I tend to write high level tests, for which indeed DrWatson does not offer a built-in infrastructure.

Maybe we can think about such?

Feb 19 '20 18:02 tamasgal

It is your call and it's fair enough to not create a test directory. :)

That is very much true, but the target of DrWatson is good scientific practice which is orthogonal to good programming practice.

This I have to disagree with though. They might not be parallel but certainly not orthogonal. The fact that most scientists don't write tests and ignore all other well established programming knowledge is a sad truth but doesn't establish orthogonality here either. So many bugs in scientific codes I witnessed could have been prevented with only the most basic unit tests. How often have I helped people get rid of anti-patterns in their code to make it fast and scaleable. Again, they might not be parallel but certainly aren't orthognal.

Feb 19 '20 18:02 carstenbauer

In my opinion tests are very important also for scientific projects. Think of refactoring or performance optimization. I wouldn't want to rework my code without being able to check if everything works as before.

Feb 19 '20 18:02 sebastianpech

It is your call and it's fair enough to not create a test directory. :)

No it isn't entirely my call, other people develop here as well and in fact e.g. @sebastianpech has frequently convinced me that things I thought were good were actually not so good.

Again, they might not be parallel but certainly aren't orthognal.

Yes, of course, certainly my expression was incorrect. The central point of discussions then is what should be "established" by the default structure here and how. @tamasgal can you explain in more detail what you mean by "high level tests"?

In my opinion tests are very important also for scientific projects.

Sure, as I quite explicitly said in the beginning of my post, what I wrote was my own, unpopular opinion.

One also must keep in mind the extend of the collaboration the project is part of. The larger the project the more important testing becomes. But for large projects, I would imagine several "DrWatson projects" to actually be the subcomponents...

Feb 19 '20 18:02 Datseris

Another point that noone mentioned so far: not every project needs a src folder. Once I went to my new job, I was blown away by realizing how few people write their own source code...

Many people use exclusively existing packages and thus only have scripts.

Feb 19 '20 18:02 Datseris

@tamasgal can you explain in more detail what you mean by "high level tests"?

Yes sure. Unit tests are tiny bits of tests as you know, so you have usually multiple test per function. Higher level tests cover more then just tiny portions of a function. In my research project it's usually something like a script, which I ran, record the results and save them in a file. A high level test for that script then makes sure that the results stay the same (within a given accuracy). But these are not "unit" tests. It's more like a regression test or a benchmark, which I prefer to also be able to restructure or improve the performance (both speed or accuracy) of the analysis while preserving the output.

Feb 20 '20 12:02 tamasgal

Okay, but what kind of built-in infrastructure would you expect for this? Seems to me that the way you perform these tests is by running a specific file and seeing if it matches the output that you have saved somewhere.

The most I can imagine being of general use is a function confirm / test / whatever that has arguments

confirm(sourcefile::String, output::Any)

which would do an @test whether the return value of the include(sourcefile) is == to the output... or if output is a path then first load it and then check... I don't know how useful this is, because in the second case I would imagine the final output of sourcefile to be some kind of number or whatever, while the loaded output will always be a Dict.

Feb 20 '20 12:02 Datseris

I am not sure either. It's quite specific to the usecase 😉 Earlier I just used the test/ folder, which was removed. So now I just create it manually.

I would really like to encourage testing and CI/CD for scientific projects, but it's hard to find a basic foundation which is flexible enough.

Feb 20 '20 13:02 tamasgal

Sure, this high level tests sound something reasonable for a scientific project.

Earlier I just used the test/ folder, which was removed. So now I just create it manually.

How does having a test folder changes things...? You still have to do manually everything :S

I would really like to encourage testing and CI/CD for scientific projects

What has to be realized is that this statement entails spending time on something that doesn't help you progress through your project faster. I am sure that you have arguments like "but having a tested project is safer and so you won't have problems in the long run". In principle this is true for large projects with many participants, where long-term stability is very important, but this is wasted time for small projects with few participants, where the project would have anyways changed drastically between a period of some months.

And once again I need to stress this: many, many people do science without "source folder". But I guess these High-Level tests that @tamasgal mentioned are useful for anyone that does science with or without source.

Regardless, the people have spoken. Having a "kind of framework" to encourage testing is a wanted feature. So, anyone that can come up with a way to do it in a general, non-invasive way, feel free to open a PR, it will be merged for sure (**as long as it doesn't make DrWatson a package manager like in #89 **) . I suggested this confirm function but I cannot imagine something more helpful.

Feb 20 '20 13:02 Datseris

How does having a test folder changes things...? You still have to do manually everything :S

I never said it does 😉 I just said: earlier it was there, so I just used that folder. I also saw it as a kind invitation. But I am not saying I want it back or whatever...

Regarding testing (scientific projects): there are obviously different opinions and that's perfectly fine. As long as everything's reproducible (which is one of the main goals of DrWatson) I don't care if someone ditches tests for whatever reason.

Regardless, the people have spoken. Having a "kind of framework" to encourage testing is a wanted feature. So, anyone that can come up with a way to do it in a general, non-invasive way, feel free to open a PR

My use cases are pretty much easily done by hand and there is not enough repetition to pour it into a package, let alone into DrWatson. But I am definitely curious about some ideas from others.

Feb 20 '20 20:02 tamasgal

I have been thinking about this last night, and I'd like to get the opinion of @tamasgal , @sebastianpech , @crstnbr . The thought goes as follows:

the goal with a science project is to lead to some result about the world, presented in numbers, tables, figures, etc.
Ideally, this result would be generated with one or several scripts that are runnable within the project's content
DrWatson enables you to make these results reproducible. at least, i think we have made a good job to make this a thing.
if you can actually reproduce the results you want, I would say the project is well tested.

Thus, my guess would be that such a testing interface should really test results, not functions. In the end in your papers you talk about results, not the functions that used to produce them.

So i guess

confirm(sourcefile::String, output::Any)

is what could help, which runs your "result producing script" and check if it indeed matches the result you had produced in the past. This could be cool because it would make your "paper figure producing scripts" immediatelly "testable" without much effort.

Feb 24 '20 09:02 Datseris

I don't think that this will turn out as a generally applicable method. For me the beauty of unit tests (which I do on a package level in each of the folders in src/) is that they run comparably fast while checking that my algorithm works as I intended. For one of my projects I'm currently working on testing the output that should go into the paper would mean repeating parameter studies that would run for weeks. As I wrote above, for such a project I extract a simplified case an move it to the respective test folder in src/.

However, writing this makes me think if this is the use case we are aiming for. TBH I don't think that anybody will rerun my code if the simulations take that long. Maybe the data I produce in the above example can rather be seen as input and should be distributed with the project. The resulting plots, tables, numbers are the actual result which can then be tested (However, whats the benefit of doing that?)

What I can think of where test on the project level might be interesting is for known benchmark / verification examples. However, the result of such an example usually goes into the paper anyways so I will actual write the test myself as it is an actual result.

So I get your point, but I don't really see a use case (for my work) as I rather write unit tests and put them in src/. However, I'm curious what the others say, maybe I'm missing something here.

Feb 24 '20 14:02 sebastianpech

the goal with a science project is to lead to some result about the world, presented in numbers, tables, figures, etc.

Ideally, this result would be generated with one or several scripts that are runnable within the project's content

DrWatson enables you to make these results reproducible. at least, i think we have made a good job to make this a thing.

if you can actually reproduce the results you want, I would say the project is well tested.

Yes I fully agree here.

Thus, my guess would be that such a testing interface should really test results, not functions. In the end in your papers you talk about results, not the functions that used to produce them.

Indeed. I personally really only use tests in scientific analyses for basically making an automated check of the outputs in combination with performance optimisations. So it's very specific and I think that it's hard to find a commonly applicable workflow.

So i guess
confirm(sourcefile::String, output::Any)
is what could help, which runs your "result producing script" and check if it indeed matches the result you had produced in the past. This could be cool because it would make your "paper figure producing scripts" immediatelly "testable" without much effort.

I think that's a good idea but the testing of the results is quite complicated. Sometimes the output is a tree or so, which requires custom code to check equality. That's the part where I did not find any common practice yet and also don't know if it's worth the effort ;)

Mar 12 '20 11:03 tamasgal

Somewhat related, there is https://github.com/Evizero/ReferenceTests.jl to help test against reference files.

Jun 16 '20 14:06 mauro3

I guess I'm late to the party but I just wanted to re-iterate the desire for this or something like #352 in my workflow.

My use case is very close to what has been described above, where I write

@testset "Known values" begin
  Sol = SolveBenchmarkModel() # A function that solves the model with some parameters
  @test Sol.foo == 18374.210023
  @test Sol.bar == NaN
  ....
end

While it is annoying to have to rewrite these values whenever I want them to change is annoying, the amount it has saved me while refactoring, and in particular when speeding up code, is worth it. I started doing this after having to many accidents where I did changes that shouldn't change results, eye-balled results, "verified" that things didn't change, only to see them change and then having to manually go back and find the guilty commit.

Aug 13 '22 01:08 eirikbrandsaas

can you review #352 @eirikbrandsaas ?

Aug 13 '22 06:08 Datseris

Depends what you mean by review. No point in me reading the code, but I'd be happy to try to start a new project and check that it works for me, if you tell me how I can run it. (Tried to look at it but have no idea to run it).

Aug 13 '22 16:08 eirikbrandsaas

well, sure, can you do that then?

Aug 13 '22 21:08 Datseris

Yes, if you tell me the steps I have to make to to clone/fork/whatever and create the initial project.

Aug 15 '22 12:08 eirikbrandsaas

DrWatson.jl DrWatson.jl copied to clipboard

Testing for scientifc projects

DrWatson.jl
DrWatson.jl copied to clipboard