openhab-core icon indicating copy to clipboard operation
openhab-core copied to clipboard

nrjavaserial in OH 2.5.4 breaks runtime on FreeBSD

Open RafalLukawiecki opened this issue 5 years ago • 8 comments

There are several reports of nrjavaserial-3.15.0.OH2 causing Java abort traps in OH 2.5.4, see the latest thread here, for example as reported by MrRusch:

RXTX Warning:  Removing stale lock file. /var/spool/lock/LK.255.000.138
RXTX uucp_lock() /var/spool/lock/LK.255.000.138 is there
/dev/cuaU0 testRead() Lock file failed
RXTX uucp_lock() /var/spool/lock/LK.255.000.138 is there
/dev/cuaU0 testRead() Lock file failed
Abort trap

I have tested with different serial ports linked and mounted in different parts of the filesystem, including over TCP using ser2net and I see the exact same issue when using the ZWave binding 2.5.5 talking to Aotec Gen5 stick.

What is weird is that OH is working, the binding is working, usually for about 30 seconds, then it throws the above errors and aborts. I have managed to get it once to keep going for a few minutes. It was producing the above Lock file messages repeatedly, creating and deleting the lock file, and causing the binding to appear to be offline every 30 seconds or so. In-between, the binding is receiving messages and operates well.

I wonder if #1426 could help in 2.5.x. I will also help getting the native nrjavaserial binaries 5.0.0 for FreeBSD, see this issue.

RafalLukawiecki avatar May 05 '20 09:05 RafalLukawiecki

Having recompiled openhab-core 2.5 using nrjavaserial-5.0.0 (without the OH own locking modifications—where can I find them?) I am still getting an Abort trap under FreeBSD when using a socat'ed port, but the error is a little different this time:

uucp_lock() /var/spool/lock/LK.255.000.241 is there
Abort trap

If anyone can point me to the branch or commit containing the OH modifications to nrjavaserial I can try again. I looked in the openhab/nrjavaserial repo but the master branch (last commit 2017) is already merged into 5.0.0 from neuronrobotics, unless I have missed something.

RafalLukawiecki avatar May 06 '20 09:05 RafalLukawiecki

I rebuild them using the same sources as nrjavaserial but with some different compiler flags. See this nrjavaserial-builder Docker container.

I will no longer recompile the libraries, because all issues I ran into were resolved in 3.20.0. See also: https://github.com/openhab/openhab-core/pull/1426. I didn't test using the library with socat on FreeBSD. :wink:

So if you still have issues with nrjavaserial, create an issue for them in the nrjavaserial issue tracker.

Also, if you want FreeBSD to be properly supported, try to figure out how to cross compile the libraries on Ubuntu as was suggested in https://github.com/NeuronRobotics/nrjavaserial/issues/164#issuecomment-624041745. Otherwise someone running FreeBSD will always have to create a PR with the recompiled libraries.

wborn avatar May 06 '20 11:05 wborn

Thank you, @wborn. I am still a bit unsure how the dependencies work here. In any case, looking at your docker scripts, am I right in thinking that the key functional difference is your use of the lockdev/liblockdev1 library? I cannot see that library it in the FreeBSD ports ecosystem. I wonder if BSD manages locking differently.

Would you think the lack of the lockdev library could be the reason things are broken when I tried using nrjavaserial-5.0.0 in OH2.5? Even if nrjavaserial was working, could it be that OH makes some explicit calls to check the locks etc? Or does it sound like a FreeBSD-oriented bug with nrjavaserial, after all?

I have too little experience with Ubuntu or Docker to help much in automating the cross compilation of these libraries on Ubuntu. I would not even know how to run FreeBSD on Ubuntu.

I am happy, however, to occasionally recompile nrjavaserial when I get the ping.

RafalLukawiecki avatar May 06 '20 12:05 RafalLukawiecki

the key functional difference is your use of the lockdev/liblockdev1 library

That's right, the locking with liblockdev used by nrjavaserial was causing issues so I recompiled the libraries to use no locking at all. That might still have caused issues e.g. when the application connects multiple times to the same port. But overall it resulted in less issues.

Recently nrjavaserial switched the locking mechanism back from liblockdev to a built-in locking mechanism which worked well after some testing and a small fix, see the discussion in https://github.com/NeuronRobotics/nrjavaserial/issues/156#issuecomment-611715451.

It doesn't look like liblockdev was ever used with the FreeBSD library.

could it be that OH makes some explicit calls to check the locks etc

The openHAB code does not and should never have any knowledge of the locking mechanisms being used by this library.

Or does it sound like a FreeBSD-oriented bug with nrjavaserial, after all

When the JVM crashes it creates a hs_err_pid file with detailed logging and the reason of the crash. Usually it's a SIGSEGV aka Segmentation fault.

wborn avatar May 06 '20 13:05 wborn

Thank you for the clarifications. I will locate the hs_err_pid file and see what clues I find in there. Having said that, I wonder if you had a chance to see my other issue about the rfc2217 not working, and the slight change in behaviour (now showing locking issue messages) after the change to 5.0.0?

RafalLukawiecki avatar May 06 '20 13:05 RafalLukawiecki

Sorry, I have never used serial devices using rfc2217.

wborn avatar May 06 '20 14:05 wborn

is this still the case with latest openHAB? We are currently using nrjavaserial 5.2.1.OH1.

J-N-K avatar Apr 30 '22 15:04 J-N-K

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 29 '22 16:06 stale[bot]