lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Preliminary support for scheduling enclaves in the C target

Open erlingrj opened this issue 1 year ago • 9 comments

This PR adds preliminary support for scheduling enclaves in the C target. This a new attempt at a simpler AST transformation based on only replacing the connections between enclaves with generated EnclavedConnection reactors. This is modeled after the DelayedConnections and PhysicalConnections in the C target.

Limitations:

  1. No zero-delay cycles. Enclaves do not coordinate with PTAGs yet
  2. No banks of enclaves
  3. No enclaves with multiport
  4. No enslaved connections using either broadcast connections or connections with multiple ports on each side of the connection operator.

Corresponding reactor-c PR: https://github.com/lf-lang/reactor-c/pull/308

erlingrj avatar Nov 14 '23 17:11 erlingrj

I have now tagged you @lhstrh and @edwardalee. I think this, and the corresponding PR in reactor-c, are getting ready. This is preliminary support for enclaves with the following limitations:

  • Enclaves in banks
  • Enclaves with multiports
  • Enclaves using with multi-connections (multiple connections in one statement)
  • Enclaves with array ports
  • Enclaves with zero-delay cycles (no PTAG stuff)
  • No enclaves inside a mode

erlingrj avatar Nov 23 '23 14:11 erlingrj

Hmmmm, all the Enclave test programs crashes on Windows with a STATUS_ACCESS_VIOLATION which means "Reading or writing to an inaccessible memory location."

erlingrj avatar Dec 15 '23 13:12 erlingrj

Hmmmm, all the Enclave test programs crashes on Windows with a STATUS_ACCESS_VIOLATION which means "Reading or writing to an inaccessible memory location."

I have found a number of errors in the termination function w.r.t. enclaves. I am fixing these in the remove-absent-messages branch. Not sure whether they are related...

Actually, I take it back. These were not errors...

edwardalee avatar Dec 16 '23 19:12 edwardalee

I have experienced that MSVC is more strict than GCC and that running C programs with undefined behaviour leads to crashes on Windows and not Linux. My hunch here is that there is some issue in rti_common.c introduced/exposed recently, I didn't see this error until I merged this branch with the latest changes on main. Before the enclave support, rti_common.c was never compiled for Windows since it was only part of the RTI which is only compiled for macOS and Linux.

Is it possible that the issues we are having on remove-absent-messages are caused by the same potential coding error which results in undefined behaviour?

erlingrj avatar Dec 16 '23 20:12 erlingrj

Is it possible that the issues we are having on remove-absent-messages are caused by the same potential coding error which results in undefined behaviour?

Yes, this is very possible.

edwardalee avatar Dec 16 '23 20:12 edwardalee

I double checked the code in the last-time merge and couldn't find anything suspicious. Unfortunately, ssh into CI doesn't work anymore, so have neither a way to see the output nor a way to replicate the errors occurring in remove-absent-messages (macOS only)... not sure how to proceed.

edwardalee avatar Dec 17 '23 05:12 edwardalee

Tomorrow I will try to reproduce the error in this branch on a Windows VM. Hopefully I will get some info on where the memory error is, I will report back maybe these errors are related

erlingrj avatar Dec 17 '23 20:12 erlingrj

@edwardalee this is a long shot. But I address a few memory issues in this commit: https://github.com/lf-lang/reactor-c/pull/308/commits/37a1249afee9b2f459e66c6d487efa437df49354 it does affect the RTI. Malloc is used to create a new struct, one of the fields of the struct is a pointer. This field is not set explicitly to NULL and later, if it is non-NULL, it is assumed to have been allocated and is freed.

You could cherry-pick it to your branch. But, yeah, its a long shot.

erlingrj avatar Dec 18 '23 12:12 erlingrj

Thanks. I've cherry picked this, but it's unlikely to make a difference because the deadlock appears to be occurring during shutdown. I do now have a lead, however. It turns out that a failure to write to a socket triggers a SIGPIPE signal which, by default, shuts down the program. However, shutting down the program causes a termination function to be invoked, which tries to acquire a mutex lock that is used to protect writes to a socket. This could lead to a deadlock. I'm trying two experiments now: First, see whether ignoring the SIGPIPE signal prevents the problem. If this fails (each experiment takes a while), then I will try avoiding acquiring the mutex lock in the terminate_execution function. This could, however, result in a corrupted resign message to the RTI, so I'm hoping the first solution will work.

edwardalee avatar Dec 18 '23 19:12 edwardalee