Hunt down outstanding warts in the inference quality build
-
[ ] venture-performance failed test.performance.asymptotics.test_references.testReferencesProgram1 with
AssertionError: Runtime of f is growing. Times: [(512, 0.02891600000003791), (614, 0.029139999999983957), (736, 0.029582000000004882), (883, 0.030401999999980944), (1059, 0.031723000000056345), (1270, 0.032411000000024615), (1524, 0.03531199999997625), (1828, 0.03521499999999378), (2193, 0.03660300000001371), (2631, 0.03697599999998147), (3157, 0.03917699999999513)] Differences: [0.00044200000002092565, 0.0008199999999760621, 0.0013210000000754007, 0.0006879999999682695, 0.0029009999999516367, -9.699999998247222e-05, 0.0013880000000199288, 0.00037299999996776023, 0.0022010000000136642] -
[ ] puma-performance stochastically fails at test.performance.asymptotics.test_double_recursion.test_double_recursion
AssertionError: Runtime of f is growing too fast. Times: [(128, 0.131216), (153, 0.163983), (183, 0.19659300000000002), (219, 0.2397879999999999), (262, 0.2920639999999999), (314, 0.3594019999999998), (376, 0.4368019999999997), (451, 0.5348049999999995), (541, 0.6699250000000001), (649, 0.8513019999999996), (778, 1.0269719999999998), (933, 1.2861319999999994), (1119, 1.6264889999999994)] Ratios: [0.0013106799999999996, 0.0011886727272727277, 0.0011930989010989, 0.001200358208955223, 0.001226806451612902, 0.0012322016129032245, 0.0012495015479876146, 0.0013043801452784507, 0.001382122840690978, 0.0013780861538461534, 0.0014346782608695644, 0.0015088526740665987]- stochastic but fairly common
-
[ ] lite-rejection-inference-quality also fails in test.conformance.test_foreign_sp.test_foreign_latents_infer
- stochastically
-
[ ] lite-rejection-inference-quality failed in test.conformance.sps.test_gp.testGPMean1
- stochastically
-
[ ] And puma-pgibbs-inference-quality in test.inference_quality.micro.test_misc_aaa.testMakeBetaBernoulli1('(lambda (a b) (let ((weight (beta a b))) (make_suff_stat_bernoulli weight)))', '(normal 10.0 1.0)')
- stochastically
-
[ ] Are we interested in trying to re-enable lite-func-pgibbs-quality-test by raising the number of particles and transitions?
- puma pgibbs works on 4 particles 10 transitions; lite fails on 4 particles 5 transitions
- (edit 2/4/16): lite-func-pgibbs-quality-test with 10 particles and 20 transitions takes five hours, and in four runs passed once, failed on test.inference_quality.micro.test_misc_aaa.testMakeBetaBernoulli2('(lambda (a b) (let ((weight (beta a b))) (make_suff_stat_bernoulli weight)))',) once, and failed on test.inference_quality.micro.test_misc_aaa.testMakeBetaBernoulli1('(lambda (a b) (let ((weight (beta a b))) (make_suff_stat_bernoulli weight)))', '10.0') twice.
- previous runs with 4 particles and 5 transitions were failing on those tests and a few others here and there
- it is possible that Puma is doing better because it is skipping these tests (if so, that would be because
make_suff_stat_bernoulliis Lite-only and the Lite SPs in Puma interface is not thread-safe and this particular inference program requests multithreaded pgibbs).
-
[ ] A while ago I had notes "something is definitely wrong with the lite misc inference quality build"
- test.smc.test_particle_filter.testBasicParticleFilter2
-
[ ] The inference quality builds had a period of repeatedly getting scrod by import errors. Is Jenkins's virtual env getting clobbered by something?
- I hypothesize that a new run of venture-crashes will now clobber the inference quality builds by uninstalling venture out of under them.
- I can't find any pip options to make it not do that: there is no --dependencies-only, there is no --do-not-uninstall-just-the-requested-package
-
[x] Slice sampling occasionally produces crazy outliers (e.g., thousands of sigma out on a Gaussian), suggesting some numerical screw-up. This one is rare enough that it's probably blocked on #139.
-
[ ](new 2/4/16): Lite func pgibbs with
Concern on slice sampling: Would K-S testing be sensitive enough to the extreme outliers that our slice sampler occasionally generates? (Right now they are caught by mean comparison testing). Do we want to explicitly check means and/or standard deviations too? (At least for distributions for which they are known?)
Slice sampling has been solved and documented (#448).