Wenqi Li

Results 316 comments of Wenqi Li

~it's always after `test_auto3dseg` or `test_auto3dseg_ensemble`, probably from~ https://github.com/Project-MONAI/MONAI/blob/9c9777751ab4f96e059a6597b9aa7ac6e7ca3b92/monai/apps/auto3dseg/data_analyzer.py#L209-L218

I'm still debugging, it's happening in single GPU according to the logs, so my previous conclusion is wrong... please ignore it. I can't replicate this issue locally.

another instance with 23.03 https://blossom.nvidia.com/dlmed-clara-jenkins/blue/organizations/jenkins/Monai-latest-image/detail/Monai-latest-image/785/pipeline/139

I'm pretty sure it's triggered by `test_auto3dseg_ensemble` and/or `test_auto3dseg`

agreed, and I never see the issue with the other versions of containers, it might be 23.03 specific (or pytorch ~2.0 specific?)

https://blossom.nvidia.com/dlmed-clara-jenkins/blue/organizations/jenkins/Monai-latest-docker/detail/Monai-latest-docker/797/pipeline/

seems to be a problem of OOM when number of threads is large and can be addressed by `OMP_NUM_THREADS=4 MKL_NUM_THREADS=4`, closing this for now.

I think if ignite's `supervised_training_step` is not directly usable, we should create a util function in `monai.engine.utils`: ```py def grad_accumulation_iteration(steps=...): def iteration(engine, ...): ... return engine.output return iteration ``` and...

sure, please consider creating a function in `monai.engines.utils` and the engine will prefer this iteration_update if it's provided: https://github.com/Project-MONAI/MONAI/blob/e375f2a17c098d7b802e5ca64322db6ce874a3aa/monai/engines/workflow.py#L125-L128 this is how we create various iteration_update functions, for example: https://github.com/Project-MONAI/MONAI/blob/dev/monai/apps/deepedit/interaction.py#LL26C7-L26C18...

https://github.com/Project-MONAI/MONAI/issues/6110#issuecomment-1475689238 the non-deterministic behavior may come from the usage of nn.Upsample