Holger Roth

Results 23 comments of Holger Roth

The issue came from using an old UNet style config in the NVFlare configuration. ``` >>> net = UNet( ... dimensions=2, ... in_channels=1, ... out_channels=2, ... channels=(16, 32, 64, 128,...

`epoch_length` corresponds to number of iterations needed to iterate once of the data (i.e., one epoch). It defaults to `len(train_data_loader)`.

Yes, I would propose that if `n_iterations` is provided, it should be set to max_epochs=1 and epoch_length=n_iterations.

This functionality should also be supported by evaluation/inference workflows. Consider calling it `interrupt()` and `terminate()` to better align with the Ignite naming convention.

FYI, `interrupt()` is necessary to support iteration-based aggregation workflows in FL.

Hi @vfdev-5, thanks for following up on this. 1. I agree, we probably don't need to fire these events as the epoch has already started earlier. 2. The intention was...

Yes, for `interrupt()`, the memory release shouldn't be necessary but `terminate()` should release it if possible, I feel.

Thanks for raising this issue. If gpu memory is the problem, can you try reducing the batch size [here](https://github.com/Project-MONAI/tutorials/blob/0c83205c3f25c120428dfaa16bc584384cdfa986/federated_learning/breast_density_challenge/code/configs/mammo_fedavg/config/config_fed_client.json#L36) and [rebuild](https://github.com/Project-MONAI/tutorials/tree/main/federated_learning/breast_density_challenge#12-build-container) the container before running again.

@unalakunal, any updates on this issue?

I'm I right that you only have one GPU? If so, did you adjust the GPU indices in [run_all_fl.sh](https://github.com/Project-MONAI/tutorials/blob/a1064b910c31f2a5288cc4d668e4a7c08346fbc2/federated_learning/breast_density_challenge/run_all_fl.sh#L7)? They should be all 0 if just one GPU. I think...