node-opencv
node-opencv copied to clipboard
terminating with uncaught exception when running in parallel
Whenever I try and process a large number of images (usually happens around image 150, I'm processing max 16 in parallel) node-opencv
crashes and brings down the entire node process.
OpenCV Error: Bad flag (parameter or structure field) (Unrecognized or unsupported array type) in cvGetMat, file /tmp/opencv-rp0FJY/opencv-2.4.9/modules/core/src/array.cpp, line 2482
libc++abi.dylib: terminating with uncaught exception of type cv::Exception: /tmp/opencv-rp0FJY/opencv-2.4.9/modules/core/src/array.cpp:2482: error: (-206) Unrecognized or unsupported array type in function cvGetMat
If I decrease my parallel limit to 1 everything works as expected.
What functions are you calling? I've tried processing a couple of images in parallel through a REST API (built myself on top of node-opencv ) and it worked just fine.
I'm calling readImage
and detectObject
with a buffer that I provide. e.g.
function hasFaces (buffer, cb) {
opencv.readImage(buffer, function (err, image) {
if (err) { return cb(err); }
image.detectObject(opencv.FACE_CASCADE, {}, function (err, faces) {
if (err) { return cb(err); }
cb(null, (faces.length > 0));
});
});
}
This function is iterated over a set of ~2000 buffers using async with a limit of 16 jobs in parallel.
async.eachLimit(buffers, 16, hasFaces, function (err, result) {
if (err) { throw err; }
// result is array of Booleans
});
I first ran your example but instead of feeding it the buffers, I read images from my filesystem and it just worked fine. Then I looked at the error message closely and realized that it's not cascade detection issue at all.
Unrecognized or unsupported array type
OpenCV basically fails to decode your buffer into a valid image. Can you specify how you're populating your buffers?
I remember I have to create a new (binary) Buffer before sending it off to node-opencv for proper decoding. It failed otherwise for me. Here's how I did it:
// read your in whatever way, maybe just fetching the raw file say http://abc.com/img.png
var opencvBuffer = new Buffer(normalBuffer, 'binary');
opencv.readImage(opencvBuffer, function(err, image){})
Hope this helps.
I just tried again and I'm very sure that the buffers are alright.
I've tried rerunning the script numerous times, only changing the number of jobs that runs in parallel. It also seems like the more concurrency the earlier it crashes. Dragging it up to 128 makes it crash even before it has started running all those jobs.
I have a total of 1798 images, as buffers loaded into memory, which makes it easy for me to test this out. How many files do you have? Could you try something like the code below and adjust the CONCURRENCY
between runs.
var fs = require('fs');
var async = require('async');
var CONCURRENCY = 16;
function hasFaces (buffer, cb) {
opencv.readImage(buffer, function (err, image) {
if (err) { return cb(err); }
image.detectObject(opencv.FACE_CASCADE, {}, function (err, faces) {
if (err) { return cb(err); }
cb(null, (faces.length > 0));
});
});
}
var list = fs.readdirSync('./photos');
var buffers = list.map(function (path) { return fs.readFileSync(path); });
async.eachLimit(buffers, CONCURRENCY, hasFaces, function (err, result) {
if (err) { throw err; }
console.log('IT WORKS!');
});
Thank you for taking time with this :)
I also tried to remove the image.detectObject
and only load the buffers into opencv. It worked and didn't crash a single time.
Don't know if this is relevant but I might have found a bug in the c++ code.
Detecting faces is done with the help of uv_queue_work
which basically means that we have three functions running.
The one that starts everything, this is called directly from javascript and is running on the main loop. In this case it comes from the call to detectObject()
on our image object.
This function will collect all the data needed and put it onto a baton
object. This object is accessible from all three functions.
The second function does all the actual work, it runs on a background thread.
This function does, among other, three things. Create a new vector, std::vector<cv::Rect> objects;
. Calls opencv's detectMultiScale
which will populate the vector. Sets a pointer on the baton
to point to the vector baton->res = objects;
.
The problem here is that objects
was only allocated for this function and will now be free
d. It might at any point be allocated and written over with something else.
The function that passes everything back to javascript land. It constructs an array out of the baton->res
and passes it to the callback function inside nodejs.
The problem here is that it might be garbage in the ->res
and we have no idea what we are returning.
The only problem is that the error message doesn't really seem to indicate this exact problem. It's more like there is a similar problem but with the image data being free
d and then overwritten.
Even more strangely, the code snippet you shared executes alright on my machine with ocv 2.4.9 but what's strange is that for the exact same image copied ~150 times in the folder which has just 1 face in it, detectObject could only find the face in 92 of them which means that the buffer indeed gets overwritten randomly.
@peterbraden do you think it would impact performance if we jumble up the cascade load, image load and running the classifier into a single function? I'll create modify, try out and create a pull request if that works.
@LinusU your assessment makes sense and the error does give us the reason but not exactly where and how it happens :) I'll ping on this thread once I fix it.
@LinusU I need your advice: Instead of loading the cascade file for every buffer you send off -- if you want to do determine if a batch of images have faces in them or not, then it would make a lot more sense to load the cascade just once and then run cascade classifier on them at once (first just single-threaded CPU and then maybe multi-threaded CPU). Thoughts?
For this, I propose a new function say opencv.detectObjectBatch(buffers, cascadeFile, function(err, [{faces_in_img0},...])
. I am assuming the current code wanted to do something similar.
I'm not too familiar with opencv but from my understanding the code currently just loads the cascade file once inside the CascadeClassifierWrap
constructor. If you look at the function matrix.detectObject
in lib/opencv.js
you'll see that it caches the instantiation for each classifier
value so I think that it's already done.
The only gain I can see with a detectObjectBatch
function would be that it would only require one call into C++, and one call back. However I don't think that this is a necessary operation since detecting the faces is going to take way longer than that. We would also need a good way to deal with errors, do we throw away all the results if the last one gives an error?
I also think that the use case will be limited since having all the photos in memory at the same time is quite bad for performance. That's why I was limiting the whole thing to 16 parallel works to begin with. Since photo collections often can have thousands of photos in them, it's just not a good idea to load them all into memory, then send them all into C++ land, then scan them all, one by one, to finally be able to send back a long array of booleans.
The problem I talked about could be solved with a simple malloc
and free
thought, but I'm sensing that there is the same problem in more than one place. It would be really good to add a test-case for this.
Okay, so I added in a test case (I didn't really get along with vows, can't I use assert in an async function? wtf?) and I actually got a segmentation fault which was very nice! It shows that I'm on the right track. Most of the times I get the usual OpenCV Error: ...
.
I'm just going to clean up the code a bit and submit a patch with an added test.
+1
+1 still experiencing this with opencv 2.4.13 while calling detectObject
on 13 images total async
OK so the best way to avoid this error and avoid having your nodejs process crash (sometimes silenty with no error!) is to completely avoid async calls to detectObject
.
Even though detectObject
is an async function, you have to call it synchronously (call it, wait for it to finish, and only call it "one-at-a-time"), which probably means you'll probably have to process images sequentially.
EDIT: as @LinusU pointed out, this seems to be a bug with opencv.
I have also stumbled open this, node opencv can't handle parallell work on multiple images at the same time.
@timbrandin: see comment at the bottom of https://github.com/peterbraden/node-opencv/pull/223 for advice on how to update detectObject to not run images in parallel; pull requests invited. Basically, there are some functions in opencv which depend upon persistent data in their instance - where this is the case, you can't run two of these functions at the same time in different threads, so adding a mutex in node-opencv in node-opencv representation of the class, and then protecting the function concerned in the async thread solves the issue. Be aware that this has been done for some async functions in the master branch (npm install peterbraden/node-opencv - they are not in the published npm version 6.0.0).