BigDL-2.x
BigDL-2.x copied to clipboard
When I use the cluster-serving to inference face-detection model by streaming, the result is wrong.
When I use the cluster-serving to inference face-detection model by streaming, the result is wrong. When I write one data , then time.sleep(3) , the result is right.
Hi @gtfaiwxm, as we discussed, please paste more info about the model and the input/output. thanks.
Hi, @glorysdj , the face-detection-0100 model is downloaded from openvino-model zoo(2020.2 version) . The net outputs a blob with shape: [1, 1, N, 7], where N is the number of detected bounding boxes. For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max], where:
-
image_id
- ID of the image in the batch -
label
- predicted class ID -
conf
- confidence for the predicted class - (
x_min
,y_min
) - coordinates of the top left bounding box corner - (
x_max
,y_max
) - coordinates of the bottom right bounding box corner. When I keep writing an image data, the result : {'image-4-test_1.jpg': array([[[14. , 1. , 0.02250869, ..., 0. , 0. , 0. ], [14. , 1. , 0.02248706, ..., 0. , 0. , 0. ], [14. , 1. , 0.022426 , ..., 0. , 0. , 0. ], ..., [15. , 1. , 0.02205388, ..., 0. , 0. , 0. ], [15. , 1. , 0.02199526, ..., 0. , 0. , 0. ], [15. , 1. , 0.02189598, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-1-test_1.jpg': array([[[0.0000000e+00, 1.0000000e+00, 9.9770606e-01, ..., 5.1740110e-03, 7.5382727e-01, 8.2223165e-01], [1.0000000e+00, 1.0000000e+00, 9.9770606e-01, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [1.0000000e+00, 1.0000000e+00, 9.9604505e-01, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], ..., [1.0000000e+01, 1.0000000e+00, 9.5114928e-01, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [1.0000000e+01, 1.0000000e+00, 9.4461030e-01, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [1.0000000e+01, 1.0000000e+00, 9.3502581e-01, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00]]], dtype=float32), 'image-9-test_1.jpg': array([[[20. , 1. , 0.02957392, ..., 0. , 0. , 0. ], [20. , 1. , 0.02924488, ..., 0. , 0. , 0. ], [20. , 1. , 0.02915918, ..., 0. , 0. , 0. ], ..., [21. , 1. , 0.02574502, ..., 0. , 0. , 0. ], [21. , 1. , 0.02572953, ..., 0. , 0. , 0. ], [21. , 1. , 0.02556236, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-16-test_1.jpg': array([[[13. , 1. , 0.0230323 , ..., 0. , 0. , 0. ], [13. , 1. , 0.02297049, ..., 0. , 0. , 0. ], [13. , 1. , 0.02296207, ..., 0. , 0. , 0. ], ..., [14. , 1. , 0.02266836, ..., 0. , 0. , 0. ], [14. , 1. , 0.02261219, ..., 0. , 0. , 0. ], [14. , 1. , 0.02260978, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-11-test_1.jpg': array([[[22. , 1. , 0.02645806, ..., 0. , 0. , 0. ], [22. , 1. , 0.02644061, ..., 0. , 0. , 0. ], [22. , 1. , 0.02641329, ..., 0. , 0. , 0. ], ..., [23. , 1. , 0.02858338, ..., 0. , 0. , 0. ], [23. , 1. , 0.02836005, ..., 0. , 0. , 0. ], [23. , 1. , 0.02832798, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-15-test_1.jpg': array([[[10. , 1. , 0.93154913, ..., 0. , 0. , 0. ], [10. , 1. , 0.8998288 , ..., 0. , 0. , 0. ], [10. , 1. , 0.8828549 , ..., 0. , 0. , 0. ], ..., [13. , 1. , 0.02323997, ..., 0. , 0. , 0. ], [13. , 1. , 0.02311641, ..., 0. , 0. , 0. ], [13. , 1. , 0.02307337, ..., 0. , 0. , 0. ]]], dtype=float32)}. When I write an image data by time.sleep(3), the results are: {'image-4-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-1-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-9-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-16-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-11-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-15-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32), 'image-17-test_1.jpg': array([[[0. , 1. , 0.99770606, ..., 0.00517401, 0.7538273 , 0.82223165], [1. , 1. , 0.12527953, ..., 0. , 0. , 0. ], [1. , 1. , 0.11860519, ..., 0. , 0. , 0. ], ..., [2. , 1. , 0.10363458, ..., 0. , 0. , 0. ], [2. , 1. , 0.0918026 , ..., 0. , 0. , 0. ], [2. , 1. , 0.08697815, ..., 0. , 0. , 0. ]]], dtype=float32)}
we have reproduced this error, and now trying to fix it.
Hi, @glorysdj .Has this problem been resolved?
Hi @gtfaiwxm, we are testing the fix now. The fix will be merged soon. Thanks.
Hi @gtfaiwxm , we tested that this model would not predict correctly with batch in Cluster Serving while others could, so we add an entry in the config.yaml so that users could choose to disable the batch inference so as to resolve the problem.
@gtfaiwxm We are currently fixing this issue. This is an issue may well locates in Analytics Zoo core and may need some time to confirm.
To work around, you could set core_num: 1
in config file. This would get some drop of performance, but the result would be right.
This bug is fixed at #3690 , you could try it again in nightly version.
@gtfaiwxm any more questions on the issue? if no , we may close it soon.