FOTS.PyTorch icon indicating copy to clipboard operation
FOTS.PyTorch copied to clipboard

Data Parallel for multi-gpu training in mergelayer branch

Open MiZhangWhuer opened this issue 6 years ago • 2 comments

Thanks for sharing this code. I have followed your update on mergelayer branch where the recognition part is implemented. However, there is still some issues regrading to data parallel. I debug the code with pycharm and found that this problem is caused by data distribution in feature map (see line 34-37 in roi_rotate.py).

    for img_index, box in zip(mapping, boxes):
        feature = feature_map[img_index]  # B * H * W * C
        images.append(feature)

When switching to single gpu mode, everything works fine. How can we fix this errors in recognition parts?

MiZhangWhuer avatar Dec 12 '18 13:12 MiZhangWhuer

@MiZhangWhuer mergelayer branch is experimental branch that trying to parallel roirotate layer, it is not stable and may be fail somehow. So don't rely on it. Try to use feature/reg-branch instead.

jiangxiluning avatar Dec 12 '18 14:12 jiangxiluning

@MiZhangWhuer mergelayer branch is experimental branch that trying to parallel roirotate layer, it is not stable and may be fail somehow. So don't rely on it. Try to use feature/reg-branch instead.

Thanks for rapid rely. It works.

MiZhangWhuer avatar Dec 13 '18 02:12 MiZhangWhuer