javacpp icon indicating copy to clipboard operation
javacpp copied to clipboard

Intermittent Segmentation Fault

Open neil-benn opened this issue 4 years ago • 13 comments

Hello,

I'm writing an interface to a mipi camera and have written a c++ facade over the mipi driver to simplify the calling and then compiled this into a jar with JavaCPP. The java class is as follows:

package com.ziath.ardugrab;

import org.bytedeco.javacpp.Loader;
import org.bytedeco.javacpp.Pointer; 
import org.bytedeco.javacpp.annotation.Namespace;
import org.bytedeco.javacpp.annotation.Platform;

@Platform(link = {"arducam_mipicamera", "ardugrab"}, include= {"ardugrab.h","arducam_mipicamera.h"})
@Namespace("ArduGrabLibrary")
public class ArduGrab extends Pointer {

static {Loader.load();}

public static int DEFAULT_EXPOSURE = 1500;
public static int DEFAULT_FOCUS = 400;
public static int DEFAULT_WIDTH = 4656;
public static int DEFAULT_HEIGHT = 3496;
public static int CAMERA_NOT_CONNECTED = -1303;
	
public ArduGrab() {allocate();}
private native void allocate();
public native int initCamera();
public native int closeCamera();
public native void setSim(boolean sim);
public native void setDebug(boolean debug);
public native int setExposure(int exposure);
public native int getExposure();
public native int setFocus(int focus);
public native int getFocus();
public native int setResolution(int width, int height);
public native int getResolutionWidth();
public native int getResolutionHeight();

}

I'm then testing the underlying native sharedlibrary with the following code:

#include "ardugrab.h"

 using namespace ArduGrabLibrary;

int main() {
ArduGrab* ag = new ArduGrab();
ag->setDebug(true);
ag->setSim(false);
ag->initCamera();
ag->setExposure(3000);
ag->setFocus(600);
ag->setResolution(4656, 3496);
ag->closeCamera();
return 0;
}

I can execute this 1,000 times with no problem. However I've got the same thing runnign through JavaCPP:

import com.ziath.ardugrab.ArduGrab;

public class TestArduGrab {
public static void main(String args[]) {
	ArduGrab ag = new ArduGrab();
	ag.setDebug(true);
	ag.setSim(false);
	ag.initCamera();
	ag.setExposure(3000);
	ag.setFocus(600);
	ag.setResolution(4656, 3496);
	ag.closeCamera();
	ag.close();
}
}

This gives me either segmentation faults or a JVM crash about 1 in 5 times. I'm not sure how to go about testing this as there is clearly something when running through the generated jar from JavaCPP which is causing me an issue. Can you provide any tips as to how to start looking into this. I note that if I remove the calls to the underlying mipi driver library, this does not happen so it is somethign to do with refencing the third party library so I need to understand the environment when running though JavaCPP and 'raw' native code to determine why I'm getting seg faults.

I understand that this is very specific to our case, if required I would be more than happy to pay for any assistance as I'm sure you get this kind of question - a lot!

Thanks.

Regards,

Neil

neil-benn avatar Jun 11 '21 18:06 neil-benn

Could you also attach some hs_err_pid*.log files that you get with it crashes?

saudet avatar Jun 11 '21 23:06 saudet

Since you're not calling delete ag in your C++ code, you shouldn't call ag.close() in Java either. Please remove the call to ag.close() and try again.

saudet avatar Jun 11 '21 23:06 saudet

Hello,

Thanks I removed the ag.close() call but it still happens. I've attached the log files; note that sometimes I get a seg fault but the JVM doesn't crash. There are three crashes - all happened the same but it may help to have >1 instance.

Thanks again - I'm back at my desk now after a family related hiatus!

Regards,

Neil hs_err_pid12376.log hs_err_pid15671.log hs_err_pid15876.log

neil-benn avatar Jun 15 '21 12:06 neil-benn

Last time I heard, the OpenJDK builds that come with Linux distributions for ARM didn't work well at all. Please try the builds from, for example, Amazon or Oracle instead.

saudet avatar Jun 15 '21 13:06 saudet

That may be a problem as I'm currently limited to a 32 bit debian and Oracle/Amazon only have 64 bit JDKs. for 11 but I've found an armhf JDK8. I'll try that.

neil-benn avatar Jun 15 '21 13:06 neil-benn

@Grumpy141 @vb216 Which version of the JDK are you guys using?

saudet avatar Jun 15 '21 13:06 saudet

OK, I've put in Oracle JDK 8 (armhf), it's not a permanent solution as I'll need 11 at some point in the future but it is good for research purposes. Interestingly I'm not getting a JVM crash but the seg faults still happen with the process terminating:

pi@raspberrypi:~/ArduGrab $ java -version java version "1.8.0_291" Java(TM) SE Runtime Environment (build 1.8.0_291-b10) Java HotSpot(TM) Server VM (build 25.291-b10, mixed mode)

pi@raspberrypi:~/ArduGrab $ java -classpath src/util/lib/:src/util/lib/32/:target/ardugrabjava-0.0.1-SNAPSHOT.jar:src/util/java TestArduGrab set debug true set sim false init camera Found sensor imx298 at address 1A init camera done setting exposure to 3000 about to call set_control Segmentation fault

The line where we call set_control is when we map through to the underlying MIPI driver - I'm not sure who is printing 'segmentation fault' to the console, JavaCPP or the JVM?

Cheers,

Neil

neil-benn avatar Jun 15 '21 14:06 neil-benn

I am using JDK 11.  When you exit the program, call device._close(). On Tuesday, June 15, 2021, 07:56:12 AM MDT, Samuel Audet @.***> wrote:

@Grumpy141 @vb216 Which version of the JDK are you guys using?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Grumpy141 avatar Jun 15 '21 15:06 Grumpy141

Thanks, the only thing is that I'm getting a seg fault way before the exit - it is the first time I call the underlying MIPI library. I never get this when running the raw c++ code however which is the strange thing.

neil-benn avatar Jun 15 '21 16:06 neil-benn

Hello,

From what I can see there is no device._close method; however I have another issue. I've written a method that grabs a frame from the camera and writes it to disk (note it doesn't return the bytes of the image to the Java side but just writes it to disk from the C++ code). When called from C++ this works just fine (writes a file with 9075980 bytes (varies slightly due to jpeg compression) but when called from Java it only writes 6215798 bytes - I've also put in a wait of 5 seconds on the write just to be sure it is not exiting prematurely. The lib compiled by JavaCPP is as follows:

pi@raspberrypi:~/ArduGrab/target/lib $ file libarducam_mipicamera.so libarducam_mipicamera.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, BuildID[sha1]=91a29ee0c6644b5761004f80895795f0c0e3eff1, not stripped p

The library it depends on is:

libardugrab.so: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, BuildID[sha1]=697533cbd74dcba70ef41d744e489ad178ba813d, not stripped

The test c++ program is:

testardugrab: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=2aefa898d01d33586e0167ded544fb86eb17b543, not stripped

It seems that I've got something more fundamental going wrong here. Any advice would be gratefully received - as I mentioned earlier; I understand that this is a unique case and I'm more than happy to pay for consulting to help out on this.

Thanks.

Regards,

Neil

neil-benn avatar Jun 15 '21 17:06 neil-benn

The crash is happening in the JVM itself, so I'm guessing that the native library is corrupting memory. When you have a small C++ program, memory corruption is less likely to cause a crash, so that's probably why you're not seeing it there. Execute that code with Address Sanitizer or Valgrind to make sure that it's not doing anything it's not supposed to.

saudet avatar Jun 15 '21 23:06 saudet

Interestingly valgrind hangs on both the C++ and Java execution even before any sysouts are shown from either of the applications. Doing some research on valgrind hanging.

neil-benn avatar Jun 16 '21 17:06 neil-benn

Hello,

To update this, I can't get valgrind to load the application either running java or the raw C++ based app. I'm looking to see if there is someway I can firewall the memory the exact variable which is populated by the third party library and is causing the problem but I'm not a C++ expert by any means. Also trying on the vendor but they have gone silent!

As an aside I also tied with Swig to see if the problem still happened with only Swig and it is more robust than with JavaCPP but still has intermittent seg faults.

Thanks for your help - everything else is great; when it doesn't seg fault I can quickly get a 4682x3504 YUV back from the camera and pull it into OpenCV with a small CMOS camera!

Cheers,

Neil

neil-benn avatar Jun 18 '21 16:06 neil-benn