Include validation/testing as a core part of the spec?

Open tyarkoni opened this issue 5 years ago • 0 comments

Per discussion with the core team, one problem that's repeatedly come up is what to do about discrepancies in output. It's a foregone conclusion that different simulators/environments will produce different outputs at some level of granularity (if only because of floating point arithmetic differences), so the goal can't realistically be perfect reproducibility across all environments. BUT we do want to have some way of expressing what the intended/expected behavior is. E.g., if you build a model in NEURON, it might not give you exactly the same result if executed in PsyNeuLink, but the person who wrote the model (or someone else) should be able to say "we consider the result valid if, given inputs like this, the outputs look roughly like this, and the fitted/learned model parameters are in this range."

The suggestion here is to have some basic testing/validation annotations be a formal part of the specification. There are many ways to implement this, but the following seems like a reasonable set of core features we may want to include (or at least, discuss here):

This should be entirely optional (i.e., including a validation specification does not render a model document invalid), but strongly encouraged.
The validation specification should not be tied to any particular environment. I.e., it shouldn't say "you need to run the following routine in NEURON with these arguments in order to determine whether the results are valid. We'll probably want to have a separate subsection where users can indicate what environment the model was developed/run in, but that's a separate concern—the idea here is to have an environment-agnostic specification that formalizes what conditions a user deems to constitute a success.
Have a simple way to define discrete test cases. I.e., the approach could be analogous to standard software testing procedures, where you define example inputs, and then state what conditions have to hold on the output and learned model parameters.
Make reference to named parameters to the degree possible. For any output variables or internal model parameters that have named, it becomes easy (at least in principle) to specify simple conditions. E.g., "50 < node65.spikeRate < 100".
The set of available assertions would (at least initially) be very limited; it's probably not reasonable to let people, e.g., specify that the K-L divergence of some parameter has to be within X of a normal distribution. Limiting to basic logical/comparison operators, and maybe some really basic descriptive statistics (mean, variance, etc.) seems reasonable.

Putting those ideas together, we might have something like this (this is just intended to convey the idea, not to imply any commitment to this kind of structure/syntax):

{
    "tests": [
        {
            "inputs": {
                "data1": [
                    4,
                    3,
                    1,
                    5
                ],
                "data2": "my_critical_data.txt",
                "epochs": 300
            },
            "assertions": {
                "outputs": [
                    "all(output1) >= 0",
                    "8.3 <= mean(output1) <= 8.6"
                ],
                "parameters": [
                    "50 <= node1.spikeRate <= 100"
                ]
            }
        },
        {
            ...
        }
    ]
}

Feb 12 '21 18:02 tyarkoni