ignite Fixed test in test_precision_recall_curve and tests/ignite/contrib/metrics/regression

Related to #2490

I couldn't figure out why the test was passing initially, I probably needed to gather data from all the processes with idist.all_gather. But as per the conversation in the mentioned PR I have made the changes to it. Let me know if anything needs to be I will be happy to fix it.

Mar 11 '22 05:03 sayantan1410

I couldn't figure out why the test was passing initially, I probably needed to gather data from all the processes with idist.all_gather.

@sayantan1410 PR seems good. However we need to understand why the tests were passing previously. Please fetch all the data and see why it was passing. Thanks

Mar 11 '22 10:03 vfdev-5

I think there are some others tests broken like this one in regression.

Mar 11 '22 18:03 sdesrozis

@vfdev-5 Yeah trying to look for the reason !!

Mar 12 '22 04:03 sayantan1410

@sdesrozis Should I change them in this PR only ?

Mar 12 '22 04:03 sayantan1410

@sayantan1410 It would be great if you could look for others corrupted tests and fix them if you find any. I would say it's mainly in contrib/metrics/regression. You will just have to rename the PR if others fixes are done.

Thanks a lot for your help !

Mar 12 '22 07:03 sdesrozis

Yeah sure will do that.

Mar 12 '22 08:03 sayantan1410

Probable reason why https://github.com/pytorch/ignite/blob/2eb3bcf4949821a3c894a21a781c5a34bb5545e2/tests/ignite/contrib/metrics/test_precision_recall_curve.py#L196 is passing inspite of being wrong : There is a difference is value between precision, recall, thresholds and sk_precision, sk_recall and sk_thresolds in a distributed configuration, however, the difference was small and pytest.approx was taking care of it so that tests were passing. Code to reproduce:

import torch
import ignite.distributed as idist
from ignite.exceptions import NotComputableError
from ignite.metrics import EpochMetric
from ignite.contrib.metrics import PrecisionRecallCurve
from sklearn.metrics import precision_recall_curve
from ignite.engine import Engine


rank = idist.get_rank()
torch.manual_seed(12)
device = idist.device()

def _test(n_epochs, metric_device):
    metric_device = torch.device(metric_device)
    n_iters = 80
    size = 151
    offset = n_iters * size
    y_true = torch.randint(0, 2, size=(offset * idist.get_world_size(),)).to(device)
    y_preds = torch.randint(0, 2, size=(offset * idist.get_world_size(),)).to(device)

    def update(engine, i):
          return (
              y_preds[i * size + rank * offset : (i + 1) * size + rank * offset],
              y_true[i * size + rank * offset : (i + 1) * size + rank * offset],
          )
    engine = Engine(update)

    prc = PrecisionRecallCurve(device=metric_device)
    prc.attach(engine, "prc")

    data = list(range(n_iters))
    engine.run(data=data, max_epochs=n_epochs)

    assert "prc" in engine.state.metrics

    precision, recall, thresholds = engine.state.metrics["prc"]

    np_y_true = y_true.cpu().numpy().ravel()
    np_y_preds = y_preds.cpu().numpy().ravel()

    sk_precision, sk_recall, sk_thresholds = precision_recall_curve(np_y_true, np_y_preds)
    print(f"precision: {precision}")
    print(f"recall: {recall}")
    print(f"thresholds: {thresholds}")
    print(f"sk_precision: {sk_precision}")
    print(f"sk_recall: {sk_recall}")
    print(f"sk_thresholds: {sk_thresholds}")

metric_devices = ["cpu"]
if device.type != "xla":
    metric_devices.append(idist.device())
for metric_device in metric_devices:
    for _ in range(1):
        print("new test")
        _test(n_epochs=1, metric_device=metric_device)

Command I used to run the code:

torchrun --nproc_per_node 2 filename.py --backend prefered backend

Mar 13 '22 14:03 sayantan1410

@vfdev-5 @sdesrozis Hey probably the tests for the metrics in contrib/metrics/regression are fixed, can you check once.

Mar 13 '22 14:03 sayantan1410

@sayantan1410 your repro code example does not reproduce the issue and does not work with DDP as there is no communication group created.

Mar 14 '22 10:03 vfdev-5

@vfdev-5

your repro code example does not reproduce the issue and does not work with DDP as there is no communication group created.

Yeah, I realized this now. Also, can you once review the changes in this PR, it would be helpful for me.

Mar 14 '22 17:03 sayantan1410

@vfdev-5 @sdesrozis this was the change I was thinking about, please check once.

Mar 20 '22 13:03 sayantan1410

@vfdev-5 yeah, now I understand it better than at that time :) Also, what to do to make the tests uniform ?

Mar 20 '22 16:03 sayantan1410

@vfdev-5 Hey, Any idea why the hvd test is failling, other tests are passing now.

Mar 21 '22 08:03 sayantan1410

@sayantan1410 looks like majority of tests are failing and not only hvd... Check

https://github.com/pytorch/ignite/runs/5618074267?check_suite_focus=true
https://github.com/pytorch/ignite/runs/5618074236?check_suite_focus=true#step:9:1624

Mar 21 '22 10:03 vfdev-5

@vfdev-5 Hey, that is of the old tests, I have made a commit after that and it is showing me that the tests are passing except for hvd. Probably you need to refresh the page once.

Mar 21 '22 12:03 sayantan1410

Oh, I see, sorry. I rerun the tests, output looks strange

Mar 21 '22 12:03 vfdev-5

@vfdev-5 Hey, Getting assertion error if I am not increasing the tolerance, Any idea how should I fix this ?

Mar 23 '22 06:03 sayantan1410

@sayantan1410 can we split this PR such that improved tests that are passing could be merged and those are not passing we could investigate case by case. Otherwise you have a stuck block of modifications...

Apr 09 '22 15:04 vfdev-5

@vfdev-5 I could do that, basically, only the precision_recall_curve is working, if possible can we have a meet once, I can show you where I am stuck, and we can solve the issue may be.

Apr 09 '22 16:04 sayantan1410

Closing this PR in favor of #2655. Thanks for your work @sayantan1410 !

Sep 04 '22 15:09 vfdev-5

ignite ignite copied to clipboard

Fixed test in test_precision_recall_curve and tests/ignite/contrib/metrics/regression

ignite
ignite copied to clipboard