ignite
ignite copied to clipboard
Compute metric per image and handler mutex in DistributedDataParallel
❓ Questions/Help/Support
Hi @vfdev-5 ,
I am writing an ignite handler to write the segmentation metrics of every image into 1 CSV file as the summary, for example:
metrics.csv:
/data/spleen/image_1 0.85
/data/spleen/image_2 0.87
/data/spleen/image_3 0.91
... ...
The problems are that:
- I tried to add logic to
metrics.update()to cache every record and write to CSV inmetrics.complete(), but ignite.metrics only acceptsoutput_transform, so I can't extract the filenames fromengine.state.batch. - Then I changed to write a separate handler for this feature, but ignite metrics only saves the final average metrics into
engine.state.metrics, handler is not easy to get every metric value corresponding to every image. - Another problem is the
DistributedDataParallel, when I run the handler in multi-processsing, how do you usually use the multi-processing lock to save content into 1 CSV in both unix and windows OS?
Thanks.
@Nic-Ma Hi !
- You are right,
metricsworks on (transformed) outputs from the process function but thebatchis embedded in theengine.state. I suppose that the filenames are provided by dataset and are packed in every batch when__getitem__is called. I could try on my side to see if I can help. - Again, if we define the dataflow from dataset to metrics, we could go further together. What do you think ?
- No idea atm
Hi @Nic-Ma ,
As far as I understand your question, it is about evaluating a pretrained model on images and output the metric per image.
For a single process, IMO, this can be done like that:
from ignite.engine import Engine, Events
from ignite.metrics import Accuracy, BatchWise
num_classes = 10
batch_size = 4
data = [
{
"x": torch.rand(batch_size, 3, 32, 32),
"y": torch.randint(0, num_classes, size=(batch_size, 32, 32)),
"filenames": [f"file_{i + 0 * batch_size}" for i in range(batch_size)]
},
{
"x": torch.rand(batch_size, 3, 32, 32),
"y": torch.randint(0, num_classes, size=(batch_size, 32, 32)),
"filenames": [f"file_{i + 1 * batch_size}" for i in range(batch_size)]
},
]
def infer_step(engine, batch):
x = batch["x"]
y = batch["y"]
return {
"y_pred": torch.rand(batch_size, num_classes, 32, 32),
"y": y,
}
evaluator = Engine(infer_step)
acc_metric = Accuracy()
@evaluator.on(Events.ITERATION_COMPLETED)
def compute_metric_per_image(engine):
print("compute_metric_per_image")
output = engine.state.output
batch = engine.state.batch
assert len(output["y_pred"]) == len(output["y"])
assert len(batch["filenames"]) == len(output["y"])
evaluator.state.metrics = []
for y_pred, y, filename in zip(output["y_pred"], output["y"], batch["filenames"]):
acc_metric.reset()
acc_metric.update((y_pred.unsqueeze(0), y.unsqueeze(0)))
o = {
"filename": filename,
"accuracy": acc_metric.compute(),
}
evaluator.state.metrics.append(o)
@evaluator.on(Events.ITERATION_COMPLETED)
def save_files():
print("append data to CSV:", evaluator.state.metrics)
evaluator.run(data)
Another problem is the DistributedDataParallel, when I run the handler in multi-processsing, how do you usually use the multi-processing lock to save content into 1 CSV in both unix and windows OS?
Maybe, the idea is to gather all data from all participating processes to a single one and then use barrier (idist.barrier()) to write from a single process to the file.
Hi @vfdev-5 ,
Thanks very much for your detailed example!
Your program can work for the metrics-saving case, but it requires to manually call metrics.reset() and metrics.update() and set values for evaluator.state.metrics which should be done by ignite.Metric base class when we attach metrics to the engine.
Maybe we can keep the existing metrics logic, just save the metric of every image to a value in engine.state in update(), then develop a MetricsSaver handler to get the value from engine.state and write to the CSV file?
To simplify the problem of DistributedDataParallel, I am thinking maybe just every GPU process writes a separate CSV file..
Thanks.
Your program can work for the metrics-saving case, but it requires to manually call metrics.reset() and metrics.update() and set values for evaluator.state.metrics which should be done by ignite.Metric base class when we attach metrics to the engine.
Yes, that's true. As we need to compute metric per image of the batch we can not currently simply attach metric to an engine and compute batch_size of metric values on each iteration... (in some sort without averaging).
A hacky solution can be to set batch_size = 1 and make usage of BatchWise metric usage.
develop a MetricsSaver handler to get the value from engine.state and write to the CSV file?
yes, this makes sense and we can put such handler into contrib module !
To simplify the problem of DistributedDataParallel, I am thinking maybe just every GPU process writes a separate CSV file..
yes, this can be also a possible solution :+1:
which should be done by ignite.Metric base class when we attach metrics to the engine.
@sdesrozis maybe we can think about another MetricUsage like InstanceWise ?
Thanks, let me do more experiments tomorrow.
which should be done by ignite.Metric base class when we attach metrics to the engine.
@sdesrozis maybe we can think about another
MetricUsagelikeInstanceWise?
Let's think about it. From my side, I have a similar need, so I'm very interested in such a feature.
Hi @vfdev-5 and @sdesrozis ,
I think at least it's valuable to add engine as an arg for Metric.update() and Metric.compute(), so we can put some context in engine.state to share with other components, just same as regular handlers.
Thanks.
@Nic-Ma thanks for the idea. Which use-case you think of while using engine inside Metric.update/compute ? Thanks
Hi @vfdev-5 ,
I think context information is widely used in most system SW arch, like a data bus for every components to produce/consume. Your engine is a good object as context, all handlers can communicate through engine, but metrics can't access it directly.
For example, if I want to save some temp values(mean dice of every image) during metrics computation and print out in another handler later, I can save them into engine.state.metrics_report after every iteration.
Thanks.
I see, thanks for explaining !
add engine as an arg for Metric.update() and Metric.compute()
In my mind, update/compute methods are sort of independent of Engine etc.
Maybe, it would make sense to store engine once passed to attach :
def attach(self, engine: Engine, name: str, usage: Union[str, MetricUsage] = EpochWise()) -> None:
self.engine = engine
thus, it can be accessible in all methods. We discussed that previously and seems like it was not retained... @sdesrozis what do you think ?
On the other hand, out-of-the-box metrics do not use engine (except writing to state.metrics), so a custom metric can also add the engine as an argument on construction ...
I agree that having engine as a hub for handlers is very convenient. This ensures very strong customisation possibilities.
In the case of Metric (which are handlers like any others), engine is passed as argument to each call of attached methods (iteration_completed, etc.). The point is that Metric also offers an user api that is completely agnostic of engine. I think it is a good idea to keep this separation.
I even think we could go further in the separation and adopt a composite pattern rather than inheritance. But that's irrelevant here.
Keep a reference of engine when calling attach as suggested by @vfdev-5 is a smart and simple option. If it solves the issue of @Nic-Ma, I think we should do it.
@sdesrozis the problem here is that we already have existing API for detach and is_attached method that always require engine and adding Metric.engine would require an update for it too, I think.
Yes there should have collateral effects to control...
Hi @vfdev-5 and @sdesrozis ,
Thanks for your discussion.
I added self.engine to the metrics in this draft MONAI PR: https://github.com/Project-MONAI/MONAI/pull/1497.
Will delete it in the future when you guys added engine to the Metric base class.
Thanks.
Sounds good ! Thanks for the update @Nic-Ma