Single-Path-One-Shot-NAS-MXNet icon indicating copy to clipboard operation
Single-Path-One-Shot-NAS-MXNet copied to clipboard

Context switching causes multi GPU idling

Open CanyonWind opened this issue 6 years ago • 2 comments

https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/blob/df1c1ffdd150695745b5abd4002be330be5d13b0/oneshot_nas_blocks.py#L166

https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/blob/df1c1ffdd150695745b5abd4002be330be5d13b0/oneshot_nas_blocks.py#L467-L470

CanyonWind avatar Nov 05 '19 01:11 CanyonWind

Fix ChannelSelector training stage multi GPU idling: https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/commit/b88a486a6ffe45a7e00af6d394e5ed9999e7688d https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/commit/69d3f72832752c37f3600f4a1b9ee5d91ca4eb3f

CanyonWind avatar Nov 11 '19 23:11 CanyonWind

@CanyonWind hi,when i run train_oneshot-s+ scripts, first epoch is normal, but when second ecoch is done ,the val test hangs and gpu utils is 0 , the program is stuck.

cavalleria avatar Jan 17 '20 10:01 cavalleria