gluon-cv
gluon-cv copied to clipboard
[Feature request] I think we should hybridize the segmentation models.
The op F.contrib.BilinearResize2D need to explicitly know the destination size, so the segmentation models is not hybridized now in order to dynamically determine the destination size. I think we should hybridize the models later to improve performance and reduce memory consuming.
Agreed, @zhanghang1989 we might be able to modify the operator by allowing it to take NDArray shapes rather than arguments only.
@chinakook @zhreshold Right now, the training of semantic segmentation depends on sync batchnorm, which utilizes dataparallel
,
https://github.com/dmlc/gluon-cv/blob/master/scripts/segmentation/train.py#L163-L164
And our implementation of dataparallel
depends on thread,
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/parallel.py#L3
I searched online for a while, and it seems MXNet is not thread-safe. So the training of semantic segmentation models is not hybridizable, as long as we need syncbn. It will throw a bug something like this,
AttributeError: 'NoneType' object has no attribute '*exit*'
The inference of semantic segmentation is hybridizable though.
Maybe we can do local distributed training style instead of multithreading
Maybe we can do local distributed training style instead of multithreading
The SyncBN is implemented in operator level, which does not support distributed training. The synchronization happens within the operator and bypass the engine.