pyro Some distribution tests fail under PyTorch 2.0

Some distribution tests fail under PyTorch 2.0

Open eb8680 opened this issue 2 years ago • 1 comments

From #3192, the following distribution tests fail under torch>=2.0.0 and should be fixed prior to the next release:

[ ] tests/distributions/test_rejector.py::test_rejector
[ ] tests/distributions/test_stable.py::test_additive
[ ] tests/infer/reparam/test_stable.py::test_stable
[ ] tests/infer/reparam/test_stable.py::test_symmetric_stable
[ ] tests/infer/reparam/test_stable.py::test_distribution

@martinjankowiak says in #3192 that:

i think we can safely declare that and the stable errors as failing due to flakiness resulting from small numerical differences in ops

so presumably it should be sufficient to tweak the test tolerances until they pass more reliably.

Log with tracebacks:

FAILED tests/distributions/test_stable.py::test_additive[0.5-0.1-0.9--0.5-0.5] - assert 0.03662861114079258 > 0.05
 +  where 0.03662861114079258 = KstestResult(statistic=0.02, pvalue=0.03662861114079258, statistic_location=-0.10905032398437975, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.02, pvalue=0.03662861114079258, statistic_location=-0.10905032398437975, statistic_sign=-1) = ks_2samp(tensor([  4.1167,  -3.4120,   5.0946,  ...,  -2.1908,   0.6321, -41.7160]), tensor([-23.4938, 778.4523,  -1.4387,  ...,  34.6576,  20.2388,  -7.4209]))
FAILED tests/distributions/test_stable.py::test_additive[0.99-0.1-0.9--0.5--0.5] - assert 0.03966362560959423 > 0.05
 +  where 0.03966362560959423 = KstestResult(statistic=0.0198, pvalue=0.03966362560959423, statistic_location=-31.744199393953476, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0198, pvalue=0.03966362560959423, statistic_location=-31.744199393953476, statistic_sign=-1) = ks_2samp(tensor([-31.1715, -34.2096, -31.3321,  ..., -32.4412, -32.3797, -37.9718]), tensor([-34.7358, -27.3010, -32.6543,  ..., -29.4273, -29.3065, -34.3996]))
FAILED tests/distributions/test_stable.py::test_additive[0.99-0.1-0.9--0.5-0.5] - assert 0.03966362560959423 > 0.05
 +  where 0.03966362560959423 = KstestResult(statistic=0.0198, pvalue=0.03966362560959423, statistic_location=-24.982819826249884, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0198, pvalue=0.03966362560959423, statistic_location=-24.982819826249884, statistic_sign=-1) = ks_2samp(tensor([-24.7473, -25.2959, -24.9695,  ..., -26.0039, -25.6245, -30.5095]), tensor([-27.9319, -19.4133, -26.0759,  ..., -22.2931, -22.2170, -27.6713]))
FAILED tests/distributions/test_stable.py::test_additive[0.99-0.1-0.9--0.5-0.9] - assert 0.04822817851247244 > 0.05
 +  where 0.04822817851247244 = KstestResult(statistic=0.0193, pvalue=0.04822817851247244, statistic_location=-22.396722620761377, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0193, pvalue=0.04822817851247244, statistic_location=-22.396722620761377, statistic_sign=-1) = ks_2samp(tensor([-22.1886, -21.8388, -22.4183,  ..., -23.4237, -22.8978, -27.6130]), tensor([-25.2121, -16.2496, -23.4453,  ..., -19.4321, -19.3740, -24.9817]))
FAILED tests/distributions/test_stable.py::test_additive[1.01-0.1-0.9--0.5--0.9] - assert 0.04126178292409227 > 0.05
 +  where 0.04126178292409227 = KstestResult(statistic=0.0197, pvalue=0.04126178292409227, statistic_location=34.410527012683424, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0197, pvalue=0.04126178292409227, statistic_location=34.410527012683424, statistic_sign=-1) = ks_2samp(tensor([34.9792, 31.2549, 34.8617,  ..., 33.7577, 33.7210, 28.1077]), tensor([31.4479, 37.9496, 33.4788,  ..., 36.4092, 36.5678, 31.7427]))
FAILED tests/distributions/test_stable.py::test_additive[1.01-0.1-0.9--0.5--0.5] - assert 0.03518876957795689 > 0.05
 +  where 0.03518876957795689 = KstestResult(statistic=0.0201, pvalue=0.03518876957795689, statistic_location=31.911593241318528, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0201, pvalue=0.03518876957795689, statistic_location=31.911593241318528, statistic_sign=-1) = ks_2samp(tensor([32.4639, 29.6834, 32.3071,  ..., 31.2335, 31.2668, 25.9957]), tensor([29.0656, 35.9287, 31.0151,  ..., 34.1378, 34.2820, 29.3342]))
FAILED tests/distributions/test_stable.py::test_additive[1.01-0.1-0.9--0.5-0.9] - assert 0.04639733883233797 > 0.05
 +  where 0.04639733883233797 = KstestResult(statistic=0.0194, pvalue=0.04639733883233797, statistic_location=23.462327711829396, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.0194, pvalue=0.04639733883233797, statistic_location=23.462327711829396, statistic_sign=-1) = ks_2samp(tensor([23.6211, 24.0275, 23.3972,  ..., 22.4239, 22.9014, 18.4463]), tensor([20.7213, 28.8884, 22.3887,  ..., 26.2179, 26.3120, 20.8984]))
FAILED tests/infer/reparam/test_stable.py::test_stable[LatentStableReparam-(4,)] - AssertionError: tensor([[-0.6467, -0.5359, -0.5765, -0.1425],
        [ 3.6042,  3.6126,  3.5015,  3.9583],
        [ 1.4299,  0.9980,  1.2120,  1.1910],
        [ 1.4928,  1.0505,  1.2678,  0.8807],
        [ 2.0543,  1.8109,  1.8507,  1.4393],
        [ 4.7879,  4.6888,  4.6451,  4.2709]], grad_fn=<CatBackward0>) vs tensor([[-0.5921, -0.5500, -0.5829, -0.1410],
        [ 3.5901,  3.6017,  3.5045,  3.9636],
        [ 1.4160,  0.9878,  1.2189,  1.1884],
        [ 1.4777,  1.0416,  1.2741,  0.8781],
        [ 2.0403,  1.8027,  1.8583,  1.4356],
        [ 4.7713,  4.6789,  4.6547,  4.2681]], grad_fn=<CatBackward0>)
FAILED tests/infer/reparam/test_stable.py::test_symmetric_stable[(4,)] - AssertionError: tensor([-0.0403, -0.1768, -0.0293,  0.0101]) vs tensor([-0.0199,  0.1129, -0.0009, -0.0492])
FAILED tests/infer/reparam/test_stable.py::test_distribution[LatentStableReparam-0.1--0.5] - assert 0.04393955208216105 > 0.05
 +  where 0.04393955208216105 = KstestResult(statistic=0.013800000000000034, pvalue=0.04393955208216105, statistic_location=-0.8912201065651693, statistic_sign=-1).pvalue
 +    where KstestResult(statistic=0.013800000000000034, pvalue=0.04393955208216105, statistic_location=-0.8912201065651693, statistic_sign=-1) = ks_2samp(tensor([-4.0060e-01, -3.4117e+03, -9.7309e+00,  ...,  7.9049e-02,\n        -8.5932e+10, -1.9342e-01]), tensor([7.8844e-02, 4.1522e-02, 6.1860e+07,  ..., 1.0619e+06, 4.6877e+03,\n        7.4023e-02]))

May 18 '23 17:05 eb8680

I can take a look at those since I wrote the failing tests.

May 18 '23 17:05 fritzo

pyro pyro copied to clipboard

Some distribution tests fail under PyTorch 2.0

pyro
pyro copied to clipboard