bayeslite icon indicating copy to clipboard operation
bayeslite copied to clipboard

Create automated validation for Satellite analysis

Open tibbetts opened this issue 9 years ago • 6 comments

checking that statements in the text of the notebook stay true. Specifically:

  • the top k results from each "ORDER BY" (similarity, prob of X) obtained from high-quality analyses cohere with top k from release amount of analysis
  • add test that the top-k are stable
  • (skippable, since not yet triggered) add test for >0.75 and <0.25 for various entries in the depprob heatmap that are referred to in the text are
  • (skippable, since not yet triggered) add tests for some subset of the statistics of INFER CONF results that are referred to in the text
  • double-check that there are no other untested, non-skippable dependencies between the text and the content

tibbetts avatar Sep 21 '15 16:09 tibbetts

One example, which was checked in as https://github.com/probcomp/bdbcontrib/blob/54d3025c7bd930d77d3fa3d25c89f414d611b9d8/examples/satellites/Satellites.ipynb, where longitude_in_radians_of_geo is not clustered with geopolitics. image

tibbetts avatar Sep 21 '15 21:09 tibbetts

Additional instances of potential instability:

  • We once had the ISS shown as weirdest by expected_lifetime; some time later we documented Sircal 1A as the weirdest; now it is not shown as weirdest.
  • Sircal 1B's rank of satellites similar to Sircal 1A varies from 2 to at least 10
  • The GEO periods that are obvious data entry errors (23.96 minutes) are not stably at the top of the anomalous periods query

axch avatar Sep 21 '15 21:09 axch

@riastradh-probcomp says (in probcomp/bdbcontrib#42, which is a duplicate of this): (a) We need to determine how to assess the stability of phenomena for our demos. (b) We need to find stable phenomena for our demos. (c) We need to automatically test these in our demos.

axch avatar Sep 21 '15 21:09 axch

Hypothesis: all the predictive probability of Orion 6, SDS-III 6, and SDS-III 7 periods comes from positing that they are singletons

  • Evidence: the values are all the same
  • DSP 20 is lower?

axch avatar Oct 27 '15 22:10 axch

More outstanding questions on what exactly we want to validate:

  • How do we check stability of the thing we want to say about the dependence probability diagram? Specific queries that some groups of entries are high and others low?
  • If 1306.29 minute satellites are at the top of the "unlikely periods of GEO satellites" query, why are we not talking about them?
  • We could go about reviving the predicted lifetime segment. Some potentially useful probe functions:
    • how often does the ISS have the weirdest predicted lifetime? top 5?
    • what about other weird satellites? how stable are the predictive probabilities?

axch avatar Oct 27 '15 22:10 axch

There is now some stability code in bdbcontrib/examples/satellites -- thanks @axch!

What it lacks (apart from being a particular set of queries that may or may not stay in the notebook) is a set of boundary conditions on what is acceptable. Suggestion is to look at the existing values, assert that they won't get (much) bigger, and push on those boundaries until the tests are no longer flaky, and at the same time we're still seeing the answers we want.

gregory-marton avatar Nov 18 '15 22:11 gregory-marton