second.pytorch
second.pytorch copied to clipboard
train.py----->"TypeError: 'numpy.float64' object cannot be interpreted as an integer" and "TypeError: Object of type 'ndarray' is not JSON serializable "
` runtime.step=2250, runtime.steptime=0.2624, runtime.voxel_gene_time=0.001374, runtime.prep_time=0.05496, loss.cls_loss=0.2967, loss.cls_loss_rt=0.3549, loss.loc_loss=0.5239, loss.loc_loss_rt=0.5036, loss.loc_elem=[0.008796, 0.01199, 0.1004, 0.0109, 0.03556, 0.01613, 0.06806], loss.cls_pos_rt=0.2487, loss.cls_neg_rt=0.1062, loss.dir_rt=0.4927, rpn_acc=0.9993, pr.prec@10=0.0732, pr.rec@10=0.8572, pr.prec@30=0.5539, pr.rec@30=0.5002, pr.prec@50=0.9257, pr.rec@50=0.1765, pr.prec@70=0.995, pr.rec@70=0.004043, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=30752, misc.num_pos=60, misc.num_neg=70263, misc.num_anchors=70400, misc.lr=0.0005046, misc.mem_usage=25.9 runtime.step=2300, runtime.steptime=0.2481, runtime.voxel_gene_time=0.001561, runtime.prep_time=0.0813, loss.cls_loss=0.2928, loss.cls_loss_rt=0.2211, loss.loc_loss=0.5189, loss.loc_loss_rt=0.3877, loss.loc_elem=[0.008733, 0.009574, 0.02625, 0.01872, 0.04538, 0.0329, 0.05227], loss.cls_pos_rt=0.1724, loss.cls_neg_rt=0.04868, loss.dir_rt=0.5206, rpn_acc=0.9993, pr.prec@10=0.07425, pr.rec@10=0.8598, pr.prec@30=0.5595, pr.rec@30=0.5073, pr.prec@50=0.9269, pr.rec@50=0.1837, pr.prec@70=0.9964, pr.rec@70=0.004993, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=34000, misc.num_pos=59, misc.num_neg=70244, misc.num_anchors=70400, misc.lr=0.0005165, misc.mem_usage=25.9 #################################
EVAL
#################################
Generate output labels...
[100.0%][===================>][20.00it/s][01:44>00:00]
generate label finished(35.85/s). start eval:
/opt/conda/lib/python3.6/site-packages/numba/core/typed_passes.py:327: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.
File "../utils/eval.py", line 129: @numba.jit(nopython=True, parallel=True) def box3d_overlap_kernel(boxes, ^
state.func_ir.loc)) Traceback (most recent call last): File "/data/second.pytorch/second/pytorch/train.py", line 407, in train detections, str(result_path_step)) File "/data/second.pytorch/second/data/kitti_dataset.py", line 149, in evaluation z_center=z_center) File "/data/second.pytorch/second/utils/eval.py", line 884, in get_coco_eval_result z_center=z_center) File "/data//second.pytorch/second/utils/eval.py", line 704, in do_coco_style_eval min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j]) File "<array_function internals>", line 6, in linspace File "/opt/conda/lib/python3.6/site-packages/numpy/core/function_base.py", line 113, in linspace num = operator.index(num) TypeError: 'numpy.float64' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/second.pytorch/second/pytorch/train.py", line 678, in
`
I'm experiencing similar issue. It reproduced on docker. Dockerfile.txt
+ python ./pytorch/train.py evaluate --config_path=/mnt/host/vol/second.pytorch/second/configs/all.fhd.config.fixed --model_dir=/mnt/host/vol/second.pytorch/second/configs/pointpillars/model_result --measure_time=True --batch_size=1
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
warnings.warn(errors.NumbaWarning(msg))
/miniconda/envs/py38/lib/python3.8/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice.
For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
warnings.warn(errors.NumbaWarning(msg))
[ 41 1280 1056]
feature_map_size [1, 160, 132]
remain number of infos: 3769
Generate output labels...
[100.0%][===================>][10.32it/s][06:16>00:00]
generate label finished(9.99/s). start eval:
avg example to torch time: 5.093 ms
avg prep time: 6.488 ms
avg voxel_feature_extractor time = 0.494 ms
avg middle forward time = 50.746 ms
avg rpn forward time = 13.280 ms
avg predict time = 23.484 ms
/miniconda/envs/py38/lib/python3.8/site-packages/numba/core/typed_passes.py:326: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.
File "utils/eval.py", line 129:
@numba.jit(nopython=True, parallel=True)
def box3d_overlap_kernel(boxes,
^
warnings.warn(errors.NumbaPerformanceWarning(msg,
Traceback (most recent call last):
File "./pytorch/train.py", line 663, in <module>
fire.Fire()
File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/miniconda/envs/py38/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "./pytorch/train.py", line 540, in evaluate
result_dict = eval_dataset.dataset.evaluation(detections,
File "/mnt/host/vol/second.pytorch/second/data/kitti_dataset.py", line 144, in evaluation
result_coco = get_coco_eval_result(
File "/mnt/host/vol/second.pytorch/second/utils/eval.py", line 877, in get_coco_eval_result
mAPbbox, mAPbev, mAP3d, mAPaos = do_coco_style_eval(
File "/mnt/host/vol/second.pytorch/second/utils/eval.py", line 704, in do_coco_style_eval
min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])
File "<__array_function__ internals>", line 5, in linspace
File "/miniconda/envs/py38/lib/python3.8/site-packages/numpy/core/function_base.py", line 120, in linspace
num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
this occur because of numpy version. if you may install numpy==1.17.4, this error will not occur
Thank you for your advice ! Today I downgraded numpy with 1.16 also according to this article. Specifically, I added following 1 line to the tail of Dockerfile
RUN conda install -c conda-forge numpy=1.16.2
And the error I experienced as I posted was solved.
However new error has appeared instead of it.
[ 41 1280 1056]
feature_map_size [1, 160, 132]
remain number of infos: 3769
Generate output labels...
[7.005%][>...................][0.31it/s][03:03>03:06:20]
Traceback (most recent call last):0it/s][04:36>04:54:37]
File "./pytorch/train.py", line 663, in <module>
fire.Fire()
File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
target=component.__name__)
File "/miniconda/envs/py37/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "./pytorch/train.py", line 524, in evaluate
detections += net(example)
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/host/vol/second.pytorch/second/pytorch/models/voxelnet.py", line 363, in forward
preds_dict = self.network_forward(voxels, num_points, coors, batch_size_dev)
File "/mnt/host/vol/second.pytorch/second/pytorch/models/voxelnet.py", line 332, in network_forward
voxel_features, coors, batch_size)
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/host/vol/second.pytorch/second/pytorch/models/middle.py", line 203, in forward
ret = self.middle_conv(ret)
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/modules.py", line 133, in forward
input = module(input)
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/conv.py", line 192, in forward
outids.shape[0])
File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
File "/miniconda/envs/py37/lib/python3.7/site-packages/spconv/ops.py", line 116, in indice_conv
int(inverse), int(subm))
File "/miniconda/envs/py37/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
oh... sorry to hear that
new issue is issue that I was not faced. sorry to much
No worries. Your advice solved the one of problems I'm facing. Thank you !
I realized elapsed time to crash from execute is vary. Perhaps this problem may be caused by my poor environment.. Corei3 1st gen, GTX 1050 , and even only 4GB RAM
I found this Japanese article which tells the solution of problem is increasing shared-memory by write shm_size: '2gb'
on the docker-compose.yml.
This solution seems to work so far (Still executing without crash).
It worked ! (Almost of 0.00 but this may be caused by different kind of issue )
Evaluation official
Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:0.00, 0.00, 0.00
bev AP:0.00, 0.00, 0.00
3d AP:0.00, 0.00, 0.00
aos AP:0.00, 0.00, 0.00
Car AP(Average Precision)@0.70, 0.50, 0.50:
. . .
Van coco [email protected]:0.05:0.95:
bbox AP:0.00, 0.00, 0.00
bev AP:0.00, 0.00, 0.00
3d AP:0.00, 0.00, 0.00
aos AP:0.00, 0.00, 0.00
This issue occurred because memory out, is right?
I guess so. It worked after add shm_size: '2gb'
to the docker-compose.yml . I didn't change other part of docker-compose.yml.txt and Dockerfile.txt
I guess so. It worked after add
shm_size: '2gb'
to the docker-compose.yml . I didn't change other part of docker-compose.yml.txt and Dockerfile.txt
Hi, I found a similar error. How to do that in a non-docker environment? I run it in my computer with one RTX 2060 in Ubuntu 18.04
You can change the evel.py file instead of changing the numpy version: /data//second.pytorch/second/utils/eval.py Line 704:
for i in range(overlap_ranges.shape[1]):
for j in range(overlap_ranges.shape[2]):
a, b, c = overlap_ranges[:, i, j] # extracting the three numbers
min_overlaps[:, i, j] = np.linspace(a, b, int(c))
# min_overlaps[:, i, j] = np.linspace(*overlap_ranges[:, i, j])
like this. This might solve the problem.
It worked ! (Almost of 0.00 but this may be caused by different kind of issue )
Evaluation official Car AP(Average Precision)@0.70, 0.70, 0.70: bbox AP:0.00, 0.00, 0.00 bev AP:0.00, 0.00, 0.00 3d AP:0.00, 0.00, 0.00 aos AP:0.00, 0.00, 0.00 Car AP(Average Precision)@0.70, 0.50, 0.50: . . . Van coco [email protected]:0.05:0.95: bbox AP:0.00, 0.00, 0.00 bev AP:0.00, 0.00, 0.00 3d AP:0.00, 0.00, 0.00 aos AP:0.00, 0.00, 0.00
Hi @WesternHill ,
Were you able to solve eval 0.00 problem? Did you get any evaluation? I am facing the same problem.
If you are running any Object detection and facing this issue, it is because of version conflicts in 'pycocotools'. Uninstall and reinstall it, your problem will be solved. pip uninstall pycocotools pip install pycocotools