simpledet
simpledet copied to clipboard
some error for retina
python3 detection_train.py --config config/NASFPN/retina_r50v1b_nasfpn_640_7@256_25epoch.py
,get Strange erros
File "detection_train.py", line 278, in
We'll fix this later to dynamically allocate memory. For temporary solution, you can replace 1500
by a larger number, may be 1800
, in models/retinanet/builder.py line 314.
@xchani not a good solution, some new erros will be introduced.workspace can not more than 2000, otherwise get below: mxnet.base.MXNetError: [13:57:18] pathto/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err ==cudaSuccess (2 vs. 0) : Name: MapPlanKernel ErrStr:out of memory But I have 16g memory per GPU(can not be used fullly)
Hi @Tveek, Could you please share more information about your software environment like how you install the MXNet?
@RogerChern Install MXNet from Scratch simpledet install . software version:mxnet=1.5.0,CUDA=8.0.61,nvidia-driver=375.26. But, other net(like dcn,efficientnet,faster,tridentnet ) can run normally. My dataset is not coco(200+ class)
@Tveek Well, this seems to be a bug of upstream MXNet due to the drop of old runtime. Currently, upgrade the CUDA version seems to be the only solution.
@RogerChern Minimum mxnet version requirements for simpledet ?
Probably not a problem with the mxnet version. using registry.cn-beijing.aliyuncs.com/rogerchen/simpledet:cuda10,It also raises the above problem
Bug confirmed. It seems if we allocate more than 2000M workspace MXNet always raises OOM. @xchani