lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Memory violation at _lf_initialize_trigger_objects with Python backend

Open turlando opened this issue 1 year ago • 4 comments

Hello,

I'm using LF through Epoch 0.3.0 on ArchLinux with the Python (3.10.7) backend and I'm facing weird-looking segmentation faults.

Unfortunately I could not narrow the issue down to a smaller test case. Here is the code I've written so far.

target Python

preamble {=
    import math
    import jack
    import mido
    import struct
=}

reactor Jack {
    state client ({= jack.Client("lf_py_synth") =})
    state audio_out ({= self.client.outports.register("audio_out") =})
    state midi_in ({= self.client.midi_inports.register("midi_in") =})

    physical action incoming_midi_event
    physical action buffer_request

    input audio_in
    output midi_out
    output produce_buffer

    reaction(startup) -> incoming_midi_event, buffer_request {=
        def process(frames):
            buffer_request.schedule(0, frames)

            for offset, data in self.midi_in.incoming_midi_events():
                raw = struct.unpack("3B", data)
                msg = mido.Message.from_bytes(raw)
                incoming_midi_event.schedule(0, msg)

        self.client.set_process_callback(process)
        self.client.activate()
    =}

    reaction(incoming_midi_event) -> midi_out {=
        midi_out.set(incoming_midi_event.value)
    =}

    reaction(buffer_request) -> produce_buffer {=
        produce_buffer.set(buffer_request.value)
    =}
}

reactor MidiMonitor {
    input midi_in

    reaction(midi_in) {=
        if midi_in.value is None:
            return

        print(midi_in.value)
    =}
}

reactor MidiToAnalog {
    input midi_in
    output gate
    output frequency

    preamble {=
        @staticmethod
        def midi_to_frequency(note):
            return 2 ** ((note - 69) / 12) * 440
    =}

    reaction(midi_in) -> gate, frequency {=
        if midi_in.value is None:
            return

        if midi_in.value.type == "note_on":
            gate.set(1.0)
            frequency.set(self.midi_to_frequency(midi_in.value.note))
        elif midi_in.value.type == "note_off":
            gate.set(0.0)
    =}
}

reactor Voice {
    state SAMPLE_RATE ({= 48000.0 =})
    state FRAME_TIME ({= 1.0 / self.SAMPLE_RATE =})

    state clock ({= 0.0 =})
    state freq ({= 0.0 =})

    input produce_buffer
    input frequency
    output audio_out

    reaction(frequency) {=
        if frequency.value is None:
            return

        self.freq = frequency
    =}

    reaction(produce_buffer) -> audio_out {=
        if produce_buffer.value is None:
            return

        buffer = []

        for i in range(produce_buffer.value):
            buffer.append(math.sin(2 * math.pi * self.clock))
            self.clock += self.FRAME_TIME
    =}
}

main reactor {
    jack = new Jack();
    midi_monitor = new MidiMonitor()
    midi_to_analog = new MidiToAnalog()
    voice = new Voice()

    jack.midi_out -> midi_monitor.midi_in
    jack.midi_out -> midi_to_analog.midi_in
    jack.produce_buffer -> voice.produce_buffer
    midi_to_analog.frequency -> voice.frequency
}

The Python modules required to run this code are: JACK-Client, mido.

Compiling and running this code causes an immediate segmentation fault without any log message whatsoever, not even "Start execution at time ...".

After a very painful debugging session I traced the issue down to the frequency reaction in the Voice reactor. If I remove the following code, the program will work correctly.

    reaction(frequency) {=
        if frequency.value is None:
            return

        self.freq = frequency
    =}

As you can see this code fragment is doing barely anything.

Here is the complete stack trace:

#0  0x00007fdaa3b43435 in _lf_initialize_trigger_objects () at file:/home/user/Epoch/lf-py-synth/src/Main.lf:554
#1  0x00007fdaa3b43718 in initialize () at core/threaded/../reactor_common.c:1960
#2  0x00007fdaa3b43cd7 in lf_reactor_c_main (argc=1, argv=argv@entry=0x55812f3e5ea0) at core/threaded/reactor_threaded.c:1175
#3  0x00007fdaa3b43e6e in py_main (self=<optimized out>, py_args=<optimized out>) at /home/user/Epoch/lf-py-synth/src-gen/Main/pythontarget.c:259
#4  0x00007fdaa3756998 in ?? () from /usr/lib/libpython3.10.so.1.0
#5  0x00007fdaa375024b in _PyObject_MakeTpCall () from /usr/lib/libpython3.10.so.1.0
#6  0x00007fdaa374afac in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.10.so.1.0
#7  0x00007fdaa3756e29 in _PyFunction_Vectorcall () from /usr/lib/libpython3.10.so.1.0
#8  0x00007fdaa3746186 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.10.so.1.0
#9  0x00007fdaa3744dd0 in ?? () from /usr/lib/libpython3.10.so.1.0
#10 0x00007fdaa37f3fb4 in PyEval_EvalCode () from /usr/lib/libpython3.10.so.1.0
#11 0x00007fdaa38039d3 in ?? () from /usr/lib/libpython3.10.so.1.0
#12 0x00007fdaa37ff36a in ?? () from /usr/lib/libpython3.10.so.1.0
#13 0x00007fdaa36a273c in ?? () from /usr/lib/libpython3.10.so.1.0
#14 0x00007fdaa36a23ed in _PyRun_SimpleFileObject () from /usr/lib/libpython3.10.so.1.0
#15 0x00007fdaa36a2da0 in _PyRun_AnyFileObject () from /usr/lib/libpython3.10.so.1.0
#16 0x00007fdaa38102ad in Py_RunMain () from /usr/lib/libpython3.10.so.1.0
#17 0x00007fdaa37e547b in Py_BytesMain () from /usr/lib/libpython3.10.so.1.0
#18 0x00007fdaa343c2d0 in ?? () from /usr/lib/libc.so.6
#19 0x00007fdaa343c38a in __libc_start_main () from /usr/lib/libc.so.6
#20 0x000055812ed1c045 in _start ()

Let me know if I can provide you with more information to track down the bug.

Thank you.

turlando avatar Sep 10 '22 15:09 turlando

Hi @turlando! Thanks for opening this issue. This is just a guess, but your problem could stem from the usage of return within the reaction. I believe that the code generator inserts some clean-up code after the reaction body and returning from the body skips this code. Could you try if it works when you use this instead:

    reaction(frequency) {=
        if frequency.value is not None:
            self.freq = frequency
    =}

Also, I just noticed while writing this, that you assign frequency to self.freq. Shouldn't this be frequency.value instead? You don't seem to use self.freq anywhere else. So this probably does not explain your issue, but it might also cause problems later on.

cmnrd avatar Sep 11 '22 15:09 cmnrd

This issue is much appreciated.

I reproduced the problem and looked at the stack trace in gdb with #line directives disabled. The segfault results from the fact that turlando_midi_to_analog_self[src_runtime]->_lf__reaction_0.triggers[1] is NULL. We use index 1 (and not 0, which is a valid index) because of the second line in this snippet of generated code:

        for (int i = 0; i < 1; i++) triggers_index[i] = 0;
        for (int i = 0; i < 1; i++) triggers_index[i] = 1;

I'm hesitant to debug this further before refactoring all of _lf_initialize_trigger_objects. The generated code, _lf_initialize_trigger_objects, is a long monolithic block of imperative code that all writes to the same (often nontrivial) pointer-based data structures, and deferredFillTriggerTable, the method in the code generator that caused this problem, is 120 lines long, contains 16 loops and branches, and has a max nesting depth of 9 levels of control flow.

This is not the only segmentation fault that we have observed in Python on startup, but AFAIK it is the only one that could be conclusively traced back to a specific line in our code or generated code. For this reason I suspect it is separate from this and this, although it might be related.

petervdonovan avatar Sep 11 '22 21:09 petervdonovan

Hello @cmnrd and @petervdonovan! Thank you a lot for the time and the effort you've put into this issue.

[...] your problem could stem from the usage of return within the reaction. I believe that the code generator inserts some clean-up code after the reaction body and returning from the body skips this code [...]

I have to admit that it's been a bit naive of me. I started placing guards around the input values because of an another issue I found earlier. The test case is relatively simple and shorter version of the program I presented in my first message.

target Python

preamble {=
    import jack
    import mido
    import struct
=}

reactor Jack {
    state client ({= jack.Client("lf_py_synth") =})
    state midi_in ({= self.client.midi_inports.register("midi_in") =})

    physical action incoming_midi_event
    output midi_out

    reaction(startup) -> incoming_midi_event {=
        def process(frames):
            for x in self.midi_in.incoming_midi_events():
                incoming_midi_event.schedule(0, x)

        self.client.set_process_callback(process)
        self.client.activate()
    =}

    reaction(incoming_midi_event) -> midi_out {=
        midi_out.set(incoming_midi_event.value)
    =}
}

reactor MidiMonitor {
    input midi_in

    reaction(midi_in) {=
        data = struct.unpack("3B", midi_in.value[1])
        msg = mido.Message.from_bytes(data)
        print(msg)
    =}
}

main reactor {
    jack = new Jack();
    printer = new MidiMonitor()
    jack.midi_out -> printer.midi_in
}

If I play notes relatively fast, sometimes midi_in.value is None, as you can see in the following output.

---- Start execution at time Mon Sep 12 09:07:11 2022
---- plus 862925833 nanoseconds.
---- Using 1 workers.
note_on channel=0 note=41 velocity=100 time=0
note_off channel=0 note=41 velocity=0 time=0
[...]
note_on channel=0 note=54 velocity=100 time=0
note_off channel=0 note=54 velocity=0 time=0
note_off channel=0 note=55 velocity=0 time=0
ERROR: FATAL: Calling reaction MidiMonitor.reaction_function_0 failed.
Traceback (most recent call last):
  File "/home/user/Epoch/lf-py-synth/src-gen/Main/Main.py", line 95, in reaction_function_0
    data = struct.unpack("3B", midi_in.value[1])
TypeError: 'NoneType' object is not subscriptable
[1]    35558 segmentation fault (core dumped)  python3 /home/user/Epoch/lf-py-synth/src-gen/Main/Main.py

Am I doing something wrong here or might it be another bug?

Also, I just noticed while writing this, that you assign frequency to self.freq. Shouldn't this be frequency.value instead?

Nice catch! Unfortunately fixing that line doesn't change the final result, the program is still segfaulting.

You don't seem to use self.freq anywhere else. So this probably does not explain your issue, but it might also cause problems later on.

The program is not complete, as I was trying to use self.freq I started to get these crashes.

This is not the only segmentation fault that we have observed in Python on startup, but AFAIK it is the only one that could be conclusively traced back to a specific line in our code or generated code.

First of all, thank you for tracing this back to the code emitter. This is my first time using LF so I can't really help much, but I can try again to make the test case shorter if you think it might help.

turlando avatar Sep 12 '22 07:09 turlando

For the record, returning from a reaction is allowed. If any target does not allow this, I would consider that a bug. In any case, the LF documentation says nothing about such a restriction.

edwardalee avatar Sep 12 '22 12:09 edwardalee

@turlando, please keep us posted on your progress. It sounds like what you're working on could be quite a fun demo!

lhstrh avatar Sep 28 '22 07:09 lhstrh

Hi @lhstrh and thank you for your message. I've been busy with another project lately but I should get back to LF soon. Given the bugs I faced I'm also thinking of switching to another backend. Rust seemed appealing but when I first tried it I couldn't make the borrow-checker happy when integrating the JACK client library into a reactor. This could definitely be due to my very limited experience with Rust, so I'm more than happy to give it another shot.

turlando avatar Sep 28 '22 14:09 turlando

@turlando Let us know if we can help you in using the JACK library in rust. We haven't really tested yet the interplay of LF-Rust code with external libraries, but me and @oowekyala would be happy to help you out with it and try to find fixes if there are any problems on our end.

cmnrd avatar Oct 07 '22 11:10 cmnrd

@cmnrd @oowekyala let me know if it's fine to abuse this issue or we'd better move to another communication channel.

Is it possible, using the state statement inside of a reactor, to destructure a tuple in an assignment and/or to do type inference?

For instance, operations on the JACK bus are performed through jack::Client, whose constructor returns a Result of the tuple (Client, ClientStatus). What I would like to do is something like let (client, _status) = jack::Client::new("...", jack::ClientOptions::NO_START_SERVER).unwrap();. Is it possible to do something similar inside a reactor?

My original solution was to wrap the tuple inside a struct but of course it doesn't scale well.

Another question related to LF per se rather than Rust: is it possible to have dynamic connections of reactors? Or is it impossible as it would make static analysis impossible to perform?

Thank you again for your time.

turlando avatar Oct 08 '22 08:10 turlando

Would it work to invoke the constructor in a reaction to startup? In that case, you can use any arbitrary target-language code.

As for the dynamic connections, the underlying reactor model supports "mutations," which, with some limitations (which preserve required static analysis), allow dynamic connections. However, this is not fully supported yet in any of the target languages, AFAIK. There has been some experimentation in the TypeScript and C targets, but still very limited.

That said, you can get very rich interconnection possibilities with static connections. See for example https://www.lf-lang.org/docs/handbook/multiports-and-banks#interleaved-connections

edwardalee avatar Oct 08 '22 14:10 edwardalee

Is it possible, using the state statement inside of a reactor, to destructure a tuple in an assignment and/or to do type inference?

This is not currently possible in the Rust target. State variable declarations in LF are translated to a Rust struct field, and so we need an explicit type and a single name.

For instance, operations on the JACK bus are performed through jack::Client, whose constructor returns a Result of the tuple (Client, ClientStatus). What I would like to do is something like let (client, _status) = jack::Client::new("...", jack::ClientOptions::NO_START_SERVER).unwrap();. Is it possible to do something similar inside a reactor?

My original solution was to wrap the tuple inside a struct but of course it doesn't scale well.

If I understand correctly, you only need the client and not the status? In that's case you can maybe select the field:

state client: jack::Client ({= jack::Client::new(...).unwrap().0 =})

Another solution is to store the entire tuple as a state var:

state clientAndStatus: {= (jack::Client, jack::ClientStatus) =} ({= jack::Client::new(...).unwrap() =})

oowekyala avatar Oct 10 '22 08:10 oowekyala