dnp3 icon indicating copy to clipboard operation
dnp3 copied to clipboard

JVM hard crash in `MasterChannel::removePoll`

Open kevinherron opened this issue 6 months ago • 6 comments

During some refactor/test cycles I managed to screw up my lifecycle management and cause a full JVM crash.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00000001372fb608, pid=51369, tid=4611
#
# JRE version: OpenJDK Runtime Environment Zulu17.58+21-CA (17.0.15+6) (build 17.0.15+6-LTS)
# Java VM: OpenJDK 64-Bit Server VM Zulu17.58+21-CA (17.0.15+6-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [libdnp3_ffi_java3084399562012808444.dylib+0x38f608]  tokio::util::rand::rt::RngSeedGenerator::next_seed::hf9158dd50c60d97f+0x1c
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/kevin/Development/IA/ignition_83/Modules/driver-dnp3-v2/dnp3-gateway-v2/hs_err_pid51369.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

I narrowed it down to calling MasterChannel::removePoll after MasterChannel::shutdown.

Even though this is obviously wrong, I think JVM crashes should be considered a critical bug even if it's a result of misusing the library.

It's easily reproducible:

class Dnp3Crash {

    @Test
    fun `removing a PollId after shutdown causes a crash`() {
        val runtime = Runtime(RuntimeConfig())

        val masterChannel = MasterChannel.createTcpChannel(
            runtime,
            LinkErrorMode.CLOSE,
            MasterChannelConfig(ushort(3)),
            EndpointList("localhost:20000"),
            ConnectStrategy(),
            ClientStateListenerImpl
        )

        val associationId = masterChannel.addAssociation(
            ushort(4),
            AssociationConfig(
                EventClasses.all(),
                EventClasses.all(),
                Classes.all(),
                EventClasses.none()
            ),
            object : ReadHandler {},
            AssociationHandlerImpl,
            AssociationInformationImpl
        )

        val pollId: PollId = masterChannel.addPoll(
            associationId,
            Request.classRequest(true, true, true, true),
            60.seconds.toJavaDuration()
        )

        masterChannel.enable()
        masterChannel.disable()
        masterChannel.shutdown()

        masterChannel.removePoll(pollId)
    }

    object ClientStateListenerImpl : ClientStateListener {
        override fun onChange(state: ClientState) {
            println("onChange: $state")
        }
    }

    object AssociationHandlerImpl : AssociationHandler {
        override fun getCurrentTime(): UtcTimestamp {
            return UtcTimestamp.valid(ulong(System.currentTimeMillis()))
        }
    }

    object AssociationInformationImpl : AssociationInformation {
        override fun taskStart(taskType: TaskType?, functionCode: FunctionCode?, seq: UByte?) {
            println("taskStart: $taskType, $functionCode, $seq")
        }

        override fun taskSuccess(taskType: TaskType?, functionCode: FunctionCode?, seq: UByte?) {
            println("taskSuccess: $taskType, $functionCode, $seq")
        }

        override fun taskFail(taskType: TaskType?, error: TaskError?) {
            println("taskFail: $taskType, error=$error")
        }

        override fun unsolicitedResponse(isDuplicate: Boolean, seq: UByte?) {
            println("unsolicitedResponse: isDuplicate=$isDuplicate, seq=$seq")
        }
    }
}

I'm attaching the hs_err_pid.log file as well.

hs_err_pid51369.log

kevinherron avatar Jun 05 '25 13:06 kevinherron

Thanks for the complete report @kevinherron

This is the same underlying issue as https://github.com/stepfunc/dnp3/issues/376 (use after free).

I'm not sure it's something we can fix without breaking API compatibility before 2.0, but I do want to look at it for the next release.

Bindings to C libraries are hard 🤷

jadamcrain avatar Jun 05 '25 14:06 jadamcrain

without breaking API compatibility before 2.0

If you're open to discussing Java API changes for 2.0 lmk via email.

The non-idiomatic design of all the little Java data classes (private constructors, no getter methods) causes a variety of issues, from little things like #370, to more "serious" issues, like this library being essentially un-mockable for testing.

e.g. HeaderInfo can't be constructed by a user nor mocked because it's just fields, which in turn makes e.g. ReadHandler::handle* methods un-mockable.

public final class HeaderInfo
{
    /**
     * underlying variation in the response
     */
    public Variation variation;
    /**
     * Qualifier code used in the response
     */
    public QualifierCode qualifier;
    /**
     * true if the received variation is an event type, false otherwise
     */
    public boolean isEvent;
    /**
     * true if a flags byte is present on the underlying variation, false otherwise
     */
    public boolean hasFlags;
    
    private HeaderInfo(Variation variation, QualifierCode qualifier, boolean isEvent, boolean hasFlags)
    {
            this.variation = variation;
            this.qualifier = qualifier;
            this.isEvent = isEvent;
            this.hasFlags = hasFlags;
    }
    
    void _assertFieldsNotNull()
    {
        java.util.Objects.requireNonNull(variation, "variation cannot be null");
        java.util.Objects.requireNonNull(qualifier, "qualifier cannot be null");
    }
}

I'm basically relegated to integration testing. Thankfully there is an outstation component to the library that means I don't always need to be integration testing against live outstations, but it doesn't have to be that way if the Java code were idiomatic Java code.

(yes, idiomatic Java code is certainly more verbose and not as pretty)

kevinherron avatar Jun 06 '25 02:06 kevinherron

Reflection with setAccessible works for some of these.

val ctor = HeaderInfo::class.java.getDeclaredConstructor(
    Variation::class.java,
    QualifierCode::class.java,
    Boolean::class.java,
    Boolean::class.java
).also { it.trySetAccessible() }

val headerInfo: HeaderInfo =
    ctor.newInstance(Variation.GROUP30_VAR1, QualifierCode.COUNT8, false, true)

kevinherron avatar Jun 06 '25 12:06 kevinherron

Meh, not as bad as I thought yesterday.

I'll just have to introduce some test helper functions:

    /**
     * Reflectively create a new [HeaderInfo].
     */
    fun newHeaderInfo(
        variation: Variation,
        qualifierCode: QualifierCode,
        isEvent: Boolean,
        isFrozen: Boolean
    ): HeaderInfo {

        val ctor = HeaderInfo::class.java.getDeclaredConstructor(
            Variation::class.java,
            QualifierCode::class.java,
            Boolean::class.java,
            Boolean::class.java
        ).also { it.trySetAccessible() }

        return ctor.newInstance(variation, qualifierCode, isEvent, isFrozen)
    }

Then ReadHandler becomes testable.

    @Test
    fun `test Dnp3Device read with GVI syntax`() {
        val variations = listOf(
            Variation.GROUP30_VAR1,
            Variation.GROUP30_VAR2,
            Variation.GROUP30_VAR5,
            Variation.GROUP30_VAR6
        )

        val readValueIds: List<ReadValueId> = variations.mapIndexed { index, variation ->
            newReadValueId("ns=1;s=[$deviceName]${variation.toGvi(index)}")
        }

        every { clientService.readWithHandler(any<Request>(), any<ReadHandler>()) } answers {
            val readHandler = secondArg<ReadHandler>()

            variations.forEachIndexed { index, variation ->
                val headerInfo = newHeaderInfo(
                    variation,
                    QualifierCode.COUNT8,
                    isEvent = false,
                    isFrozen = false
                )

                readHandler.handleAnalogInput(
                    headerInfo,
                    listOf(
                        AnalogInput(
                            ushort(index),
                            42.0,
                            Flags(Flag.ONLINE),
                            Timestamp.synchronizedTimestamp(ulong(System.currentTimeMillis()))
                        )
                    )
                )
            }

            CompletableFuture.completedFuture(NOTHING)
        }

        val values: List<DataValue> = device.read(
            newReadContext(),
            0.0,
            TimestampsToReturn.Both,
            readValueIds
        )

        values.forEachIndexed { index, it -> println("value[$index]: ${it.value}") }

        val expected0 = Variant.ofInt32(42)
        assertEquals(expected0, values[0].value)
        assertEquals(StatusCode.GOOD, values[0].statusCode)

        val expected1 = Variant.ofInt16(42)
        assertEquals(expected1, values[1].value)
        assertEquals(StatusCode.GOOD, values[1].statusCode)

        val expected2 = Variant.ofFloat(42.0f)
        assertEquals(expected2, values[2].value)
        assertEquals(StatusCode.GOOD, values[2].statusCode)

        val expected3 = Variant.ofDouble(42.0)
        assertEquals(expected3, values[3].value)
        assertEquals(StatusCode.GOOD, values[3].statusCode)
    }

How useful this will end up being TBD...

kevinherron avatar Jun 06 '25 13:06 kevinherron

I think that making some or all of the private constructors public before a 2.0 is low hanging fruit. This could save you the pain in the future of having to hack at accessibility using reflection (ouch).

My general philosophy has been to only expose API that's required to use the library, but I can appreciate why this hinders mock-ability.

I can do a review before the next release (1.7.0) to assess the impact of making the private constructors public.

For 2.0, we can change the patterns for turning our abstract model of a "struct" into Java classes if there's a compelling reason for it.

jadamcrain avatar Jun 06 '25 20:06 jadamcrain

In oo_bindgen there seems to be a distinction between classes and structs.

All of the "struct" things could probably just have public constructors across the board without much concern for safety.

All of the "class" things, which from a quick look seem to be things with a private final long self; field and methods that call native code, seem reasonable to have private constructors, especially since I'm not seeing any of these that also happen to have any other fields. They could still be mocked instead of instantiated if necessary.

In my ideal world you embrace JDK 17 as the minimum (or introduce a flag) and generate all structs as Java records instead :) Their fields would be implicitly final, no "with" mutators, no builder methods, no need to implement toString/equals/hashCode in your code-gen. Big change to the model/API though.

kevinherron avatar Jun 06 '25 21:06 kevinherron