ss3-source-code icon indicating copy to clipboard operation
ss3-source-code copied to clipboard

testing SS with data from an operating model?

Open k-doering-NOAA opened this issue 3 years ago • 7 comments

Thanks to Bai Li and Matthew Supernaw who are setting up tests of MAS using data from an operating model.

We could experiment with testing SS with data set(s) from an operating model. In this case, we would know what the true values are supposed to be, so could check that SS is getting reasonable results for all the values.

The positive aspect is that if the operating model is not based on SS, we can be a little more sure that SS is producing the "correct" results (as long as we trust that the original operating model is correctly set up). This is in comparison to our current checks based on previous runs of SS, that only checks that results haven't changed since the last "acceptable" model run.

The difficulty is it may be more challenging to figure out what tolerances to accept and if we want to run the model multiple times, then run times could be long.

This might be something like basically setting up Bai's model comparison project work to run more frequently with the current ss version. Maybe this is something we would want to run less than every commit if it takes a long time to run (maybe on a schedule, for example weekly or monthly? Or maybe only before a release?)

I think exploring a test like this could provide added value.

k-doering-NOAA avatar May 06 '21 21:05 k-doering-NOAA

I see some merit in having such a data set in our testing repertoire, but it would be a very simple test (no lengths or any of the SS bells & whistles). Also, Bai's model comparison project ended up finding a difference between SS formulation and BAM's formulation. Neither was wrong, but some transformation was needed to correctly compare. I think I see more merit in using ss3sim to generate some known data sets.

Rick-Methot-NOAA avatar May 06 '21 21:05 Rick-Methot-NOAA

Yes, the ss3sim way is another approach that could be helpful; the issue there is because it is dependent on SS, if the bug is in the operating model it will also be in the estimation model. But we would have the flexibility to easily generate complex setups.

k-doering-NOAA avatar May 06 '21 22:05 k-doering-NOAA

@Bai-Li-NOAA do you see value in this? I just came across this issue. I think it may only be worth setting up if there are additional features added into the OM after the model comparison project. I thought you and Matthew were working on adding some spatial capabilities, but I could not be remembering correctly...

k-doering-NOAA avatar Apr 04 '22 19:04 k-doering-NOAA

@k-doering-NOAA , like you and Rick mentioned, the model comparison project OM is very simple and could not help verify accuracy of complex stock assessments. But, you could set up one R integration test that loads OM data, uses r4ss to set up stock assessment input data, runs SS3, and compares SS3 estimates with OM values. This could serve as a regression test to ensure that tested r4ss and SS3 still performs the same after a change. Happy to help if you decide to add a test.

The OM has a branch that includes spatial capabilities, but it is for spatial reference points research only, has not been tested at all...

Regarding ss3sim, you may can test the estimation engine in SS3 by using ss3sim and share the experience with FIMS group. For FIMS, we perhaps can not always find an independent OM to generate perfect input data for testing FIMS. How to test FIMS using the simulated data from FIMS will be a question.

Bai-Li-NOAA avatar Apr 05 '22 15:04 Bai-Li-NOAA

Perhaps SS3 OM w/ FIMS EM, and vice versa, could be a good testing protocol.

Rick-Methot-NOAA avatar Apr 05 '22 15:04 Rick-Methot-NOAA

Thanks @Bai-Li-NOAA ! On the one hand, having a simple regression test where the OM is not the same as the EM is appealing, but on the other, I'm not sure how much we gain given we have other regression tests already.

k-doering-NOAA avatar Apr 05 '22 15:04 k-doering-NOAA

Agree with both of you! Using ss3 OM to test FIMS is documented in the FIMS testing plan. @k-doering-NOAA, you are right, you already have a suite of regression tests. We used the OM to test MAS and FIMS because they do not have that suite of regression tests yet.

Bai-Li-NOAA avatar Apr 05 '22 15:04 Bai-Li-NOAA