gluon-cv icon indicating copy to clipboard operation
gluon-cv copied to clipboard

[Feature request] I think we should hybridize the segmentation models.

Open chinakook opened this issue 6 years ago • 4 comments

The op F.contrib.BilinearResize2D need to explicitly know the destination size, so the segmentation models is not hybridized now in order to dynamically determine the destination size. I think we should hybridize the models later to improve performance and reduce memory consuming.

chinakook avatar Jan 02 '19 01:01 chinakook

Agreed, @zhanghang1989 we might be able to modify the operator by allowing it to take NDArray shapes rather than arguments only.

zhreshold avatar Jan 17 '19 19:01 zhreshold

@chinakook @zhreshold Right now, the training of semantic segmentation depends on sync batchnorm, which utilizes dataparallel,

https://github.com/dmlc/gluon-cv/blob/master/scripts/segmentation/train.py#L163-L164

And our implementation of dataparallel depends on thread,

https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/parallel.py#L3

I searched online for a while, and it seems MXNet is not thread-safe. So the training of semantic segmentation models is not hybridizable, as long as we need syncbn. It will throw a bug something like this,

AttributeError: 'NoneType' object has no attribute '*exit*'

The inference of semantic segmentation is hybridizable though.

bryanyzhu avatar Jan 24 '20 22:01 bryanyzhu

Maybe we can do local distributed training style instead of multithreading

zhreshold avatar Jan 27 '20 17:01 zhreshold

Maybe we can do local distributed training style instead of multithreading

The SyncBN is implemented in operator level, which does not support distributed training. The synchronization happens within the operator and bypass the engine.

zhanghang1989 avatar Jan 27 '20 23:01 zhanghang1989