physicsnemo
physicsnemo copied to clipboard
🐛[BUG]: DistributedManager gets silently initialized as a single process job if instantiated before initializing
Version
main
On which installation method(s) does this occur?
Source
Describe the issue
This works as expected:
In [1]: from modulus.distributed import DistributedManager
In [2]: DistributedManager.is_initialized()
Out[2]: False
In [3]: DistributedManager.initialize()
In [4]: DistributedManager.is_initialized()
Out[4]: True
In [5]: manager = DistributedManager()
In [6]: manager._initialization_method
Out[8]: 'None'
but this does not:
In [1]: from modulus.distributed import DistributedManager
In [2]: manager = DistributedManager()
In [3]: manager._initialization_method
Out[3]: 'None'
In [4]: manager.is_initialized()
Out[4]: True
Minimum reproducible example
In [1]: from modulus.distributed import DistributedManager
In [2]: manager = DistributedManager()
In [3]: manager._initialization_method
Out[3]: 'None'
In [4]: manager.is_initialized()
Out[4]: True
In [5]: manager.initialize()
/code/modulus-core/modulus/distributed/manager.py:302: UserWarning: Distributed manager is already intialized
warn("Distributed manager is already intialized")
Relevant log output
No response
Environment details
No response
One of the reasons this is happening is because the initialization check in the DistributedManager is based on checking the size of DistributedManager._shared_state: https://github.com/NVIDIA/modulus/blob/main/modulus/distributed/manager.py#L194-L197
This silent initialization can be caught by having an explicit _is_initialized member in the Borg class and only setting that to True in the initialize method.
@tge25 @dallasfoster Would this be a better way to prevent accidental usage of the DistributedManager before it is initialized?
In [1]: from modulus.distributed import DistributedManager
In [2]: DistributedManager.is_initialized()
Out[2]: False
In [3]: manager = DistributedManager()
---------------------------------------------------------------------------
ModulusUninitializedDistributedManagerWarningTraceback (most recent call last)
Cell In[3], line 1
----> 1 manager = DistributedManager()
File /code/modulus-core/modulus/distributed/manager.py:115, in DistributedManager.__init__(self)
113 def __init__(self):
114 if not self._is_initialized:
--> 115 raise ModulusUninitializedDistributedManagerWarning()
116 super().__init__()
ModulusUninitializedDistributedManagerWarning: Instantiating DistributedManager before calling DistributedManager.initialize is not recommended