DDU icon indicating copy to clipboard operation
DDU copied to clipboard

Error in computing gaussian_model

Open Karthik-Ragunath opened this issue 2 years ago • 9 comments

The gaussian mixture model computations from mean and covariance matrix (obtained from feature's of model - nn.feature), is constantly throwing the error that covariance matrix is not positive definite

gmm = torch.distributions.MultivariateNormal(
      loc=classwise_mean_features, covariance_matrix=(classwise_cov_features + jitter),
)

Error:

Expected parameter covariance_matrix (Tensor of shape (4, 512, 512)) of distribution MultivariateNormal(loc: torch.Size([4, 512]), covariance_matrix: torch.Size([4, 512, 512])) to satisfy th
e constraint PositiveDefinite(), but found invalid values:

I am not sure if I am doing anything wrong, I am using ResNet18 BTW

Karthik-Ragunath avatar Mar 07 '22 02:03 Karthik-Ragunath

Should we remove the break in gmm_utils.py in gmm_fit method and iterate until we find the jitter_eps for which the gmm could be computed?

From my observation, gmm's could be computed when jitter_eps are greater than some threshold

Karthik-Ragunath avatar Mar 07 '22 03:03 Karthik-Ragunath

That's what the code is supposed to do.

It should reraise exceptions though if the exception isn't matched against the ones we expect, which is missing.

The flow is: if bad covariance matrix causes exception => continue. Break the loop on first success

BlackHC avatar Mar 07 '22 10:03 BlackHC

Hi,

I am also having the same problem. Have anybody solved this problem.

Thanks a lot Jeethesh

jeethesh-pai avatar Nov 15 '22 21:11 jeethesh-pai

My uninformed guess would be that PyTorch changed the exception type that is being thrown.

What's the full exception trace etc?

BlackHC avatar Nov 15 '22 21:11 BlackHC

Hey thanks for quick reply. Here is the full exception trace

Traceback (most recent call last):
  File "/lhome/jeumesh/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/lhome/jeumesh/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/lhome/jeumesh/.vscode-server/extensions/ms-python.python-2022.18.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/lhome/jeumesh/Code/LabelErrorAwareObjectDetection/calc_mean_covariance.py", line 36, in <module>
    GMM_model = MultivariateNormal(class_wise_mean, covariance_matrix=class_wise_covariance)
  File "/lhome/jeumesh/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/lhome/jeumesh/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributions/distribution.py", line 56, in __init__
    f"Expected parameter {param} "
ValueError: Expected parameter covariance_matrix (Tensor of shape (20, 2048, 2048)) of distribution MultivariateNormal(loc: torch.Size([20, 1]), covariance_matrix: torch.Size([20, 2048, 2048])) to satisfy the constraint PositiveDefinite(), but found invalid values:
tensor([[[ 8.3330e-05,  7.0327e-05,  6.8997e-05,  ...,  7.0815e-05,
           4.5725e-05,  6.6394e-05],
         [ 7.0327e-05,  7.8389e-05,  7.6081e-05,  ...,  7.7024e-05,
           5.2779e-05,  7.3498e-05],
         [ 6.8997e-05,  7.6081e-05,  7.8456e-05,  ...,  7.5674e-05,
           5.1505e-05,  7.3184e-05],
         ...,
         [ 7.0815e-05,  7.7024e-05,  7.5674e-05,  ...,  7.8306e-05,
           5.2401e-05,  7.3091e-05],
         [ 4.5725e-05,  5.2779e-05,  5.1505e-05,  ...,  5.2401e-05,
           1.2630e-04,  5.2342e-05],
         [ 6.6394e-05,  7.3498e-05,  7.3184e-05,  ...,  7.3091e-05,
           5.2342e-05,  7.6710e-05]],

        [[ 8.7962e-05,  7.8796e-05,  8.4836e-05,  ...,  6.8979e-05,
          -3.3932e-04,  8.5147e-05],
         [ 7.8796e-05,  9.7942e-05,  8.0335e-05,  ...,  7.1558e-05,
          -3.4493e-04,  8.0538e-05],
         [ 8.4836e-05,  8.0335e-05,  8.8715e-05,  ...,  7.1285e-05,
          -3.3832e-04,  8.6579e-05],
         ...,
         [ 6.8979e-05,  7.1558e-05,  7.1285e-05,  ...,  1.0717e-04,
          -3.2759e-04,  7.0658e-05],
         [-3.3932e-04, -3.4493e-04, -3.3832e-04,  ..., -3.2759e-04,
           2.0091e-03, -3.3867e-04],
         [ 8.5147e-05,  8.0538e-05,  8.6579e-05,  ...,  7.0658e-05,
          -3.3867e-04,  8.8479e-05]],

        [[ 7.2886e-05,  7.0227e-05,  6.9378e-05,  ...,  2.4298e-05,
          -2.1212e-04,  4.5437e-05],
         [ 7.0227e-05,  7.7897e-05,  7.5079e-05,  ...,  3.0011e-05,
          -2.1322e-04,  4.4038e-05],
         [ 6.9378e-05,  7.5079e-05,  8.1487e-05,  ...,  3.7891e-05,
          -2.1287e-04,  4.3175e-05],
         ...,
         [ 2.4298e-05,  3.0011e-05,  3.7891e-05,  ...,  1.6317e-04,
          -1.3859e-04,  1.1140e-05],
         [-2.1212e-04, -2.1322e-04, -2.1287e-04,  ..., -1.3859e-04,
           1.2692e-03, -1.1853e-04],
         [ 4.5437e-05,  4.4038e-05,  4.3175e-05,  ...,  1.1140e-05,
          -1.1853e-04,  1.1943e-04]],

        ...,

        [[ 1.1128e-04,  7.6126e-05,  6.1290e-05,  ..., -3.1085e-05,
           3.2135e-05,  7.8840e-05],
         [ 7.6126e-05,  9.6022e-05,  7.4200e-05,  ..., -2.1044e-05,
           4.0400e-05,  9.1960e-05],
         [ 6.1290e-05,  7.4200e-05,  9.2034e-05,  ..., -3.4788e-05,
           6.1061e-05,  7.5238e-05],
         ...,
         [-3.1085e-05, -2.1044e-05, -3.4788e-05,  ...,  2.3243e-04,
          -4.0345e-05, -1.9683e-05],
         [ 3.2135e-05,  4.0400e-05,  6.1061e-05,  ..., -4.0345e-05,
           1.3551e-04,  4.2297e-05],
         [ 7.8840e-05,  9.1960e-05,  7.5238e-05,  ..., -1.9683e-05,
           4.2297e-05,  9.5137e-05]],

        [[ 3.5247e-04, -1.0403e-04, -9.9169e-05,  ...,  8.8076e-05,
           1.3890e-04, -1.0403e-04],
         [-1.0403e-04,  9.9529e-05,  9.6215e-05,  ..., -6.8703e-05,
          -1.5972e-04,  9.9422e-05],
         [-9.9169e-05,  9.6215e-05,  9.6699e-05,  ..., -6.0130e-05,
          -1.5089e-04,  9.6215e-05],
         ...,
         [ 8.8076e-05, -6.8703e-05, -6.0130e-05,  ...,  3.4801e-04,
           1.6812e-04, -6.8703e-05],
         [ 1.3890e-04, -1.5972e-04, -1.5089e-04,  ...,  1.6812e-04,
           5.3045e-04, -1.5972e-04],
         [-1.0403e-04,  9.9422e-05,  9.6215e-05,  ..., -6.8703e-05,
          -1.5972e-04,  9.9529e-05]],

        [[ 8.5129e-05,  7.4716e-05,  6.6423e-05,  ...,  7.6044e-05,
           7.4234e-05,  7.5957e-05],
         [ 7.4716e-05,  7.9972e-05,  6.7788e-05,  ...,  7.6819e-05,
           7.5168e-05,  7.6890e-05],
         [ 6.6423e-05,  6.7788e-05,  9.3844e-05,  ...,  6.8510e-05,
           6.6988e-05,  6.8581e-05],
         ...,
         [ 7.6044e-05,  7.6819e-05,  6.8510e-05,  ...,  7.8565e-05,
           7.6339e-05,  7.8062e-05],
         [ 7.4234e-05,  7.5168e-05,  6.6988e-05,  ...,  7.6339e-05,
           8.5363e-05,  7.6411e-05],
         [ 7.5957e-05,  7.6890e-05,  6.8581e-05,  ...,  7.8062e-05,
           7.6411e-05,  7.8481e-05]]])
´´´

My eigenvalues were in range of 10^-9 and some are negative. Is it because of truncation error?

jeethesh-pai avatar Nov 15 '22 21:11 jeethesh-pai

You're not using code from our repository? 😮 calc_mean_covariance.py is not one of our scripts 😄

BlackHC avatar Nov 15 '22 21:11 BlackHC

Oh yeah, I forgot to mention that, But i am just trying to fit the gaussian with the logits obtained from my trained model. Did you encounter this same error anytime?

jeethesh-pai avatar Nov 15 '22 21:11 jeethesh-pai

Hi,

For anyone who are getting an ValueError because the covariance matrix is not PostiveDefinite(), the answer lies here in this line.

https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/utils/gmm_utils.py#L93

jeethesh-pai avatar Nov 17 '22 09:11 jeethesh-pai

Please ignore the above. The issue is that you're using the "wrong" PyTorch version and not the one the code was written for and that is fixed in the environment of this repo.

Re the other point: please open a new issue for that.

Thanks!

On Thu, Nov 17, 2022, 09:36 Jeethesh Pai Umesh @.***> wrote:

Hi,

For anyone who are getting an ValueError because the covariance matrix is PostiveDefinite(), the answer lies here in this line.

https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/utils/gmm_utils.py#L93

@BlackHC https://github.com/BlackHC Can you please tell me where you are multiplying the prior with the log_prob of the logits during evaluation. [image: grafik] https://user-images.githubusercontent.com/75034628/202408223-1acf274d-23a7-494f-8965-153f1a2e346e.png

As mentioned in the line 12: We have to multiply the probability of the feature vector with the prior obtained from training. I saw a part where you perform

https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/metrics/uncertainty_confidence.py#L16 and

https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/evaluate.py#L211

but didnt actually find the code snip where you were multiplying with the prior. Is it required or is it enough to just calculate the logsumexp() of the logits obtained.

Thanks @BlackHC https://github.com/BlackHC

— Reply to this email directly, view it on GitHub https://github.com/omegafragger/DDU/issues/4#issuecomment-1318348556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBYC5TUHGVOEC7GBWOLTWIX4BBANCNFSM5QB7UJQQ . You are receiving this because you were mentioned.Message ID: @.***>

BlackHC avatar Nov 17 '22 11:11 BlackHC