lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Capability to output individual federates as standalone LF code

Open lhstrh opened this issue 3 years ago • 5 comments

Edit. Refactor logic related to federated execution out of GeneratorBase, CGenerator, PythonGenerator, and TSGenerator into a new org.lflang.federated.FedGenerator.

  • [x] Move analyzeFederates() out of GeneratorBase into FedGenerator (@Soroosh129)
  • [x] Have FedGenerator create separate .lf files for each federate (see #1212) (@Soroosh129)
  • [ ] To make each separate generated federate executable for all targets, the following needs to be done:
    • [-] Implement the federate target extensions for the following targets:
      • [x] TS (@lhstrh, @hokeun)
      • [ ] C++ (@cmnrd)
      • Note: For the TS target, the code for these interfaces is already in the TSGenerator (e.g., here)
    • [x] Implement "executable" preambles (using a syntax such as init (suggested by @cmnrd)) for each target to take care of pre-start tag initializations:
      • [x] C target (@billy-bao)
      • [x] Create a mechanism that can convert TargetConfig to TargetDecl
      • Note: For the C and TS targets, the code for pre-start tag initialization is in their respective code generators, usually surrounded by if (isFederated) checks. Implementing the executable preamble would mostly require moving these blocks to the federated generator.
  • [ ] Implement an "ever" or "observer" or "monitor" trigger used in network control reactions (@Soroosh129). The scope for now should be the main reactor (because the semantics of it, if contained within nested reactors, is unclear).
  • [x] ~~Output interface instead of using the @interface label (@lhstrh)~~ Add internal dependency information to reactions
  • [x] Have FedGenerator call each code generator depending on the target of each generated .lf file (@billy-bao)

lhstrh avatar Jun 10 '22 21:06 lhstrh

Grr, GitHub does not allow for the source branch of a PR to be changed, so I will have to move the title and comments used here to #1227.

petervdonovan avatar Jun 13 '22 08:06 petervdonovan

@hokeun:

I tried to Lfc-compile test/TypeScript/src/federated/HelloDistributed.lf, but I got an error like this:

You are using zero-delay logical connections between federates. This currently does not work because the causality interface (@interface, see the original comment on this PR) is not implemented yet. Adding an after delay should circumvent this problem for now.

Soroosh129 avatar Jul 24 '22 09:07 Soroosh129

I'm a bit stuck at the moment in my attempt to remove currentFederate from the CGenerator.

Specifically, there are multiple places in the CGenerator where running counts (that are class attributes) get incremented by currentFederate.numRuntimeInstances(...). I have no idea what these are for and how I might be able to remove them. @edwardalee would you happen to remember what each of these running counts are for? I think I need your help to get rid of them.

Here are some examples where this pattern is used: 1 2 3 4 5 6 7.

Soroosh129 avatar Jul 24 '22 09:07 Soroosh129

I'm a bit stuck at the moment in my attempt to remove currentFederate from the CGenerator.

Specifically, there are multiple places in the CGenerator where running counts (that are class attributes) get incremented by currentFederate.numRuntimeInstances(...). I have no idea what these are for and how I might be able to remove them. @edwardalee would you happen to remember what each of these running counts are for? I think I need your help to get rid of them.

These counts are used to determine the sizes of statically allocated arrays, for example, the array of startup reactions that need to be invoked at startup.

The issue here is that one instance of ReactorInstance may represent some number of runtime instances of the reactor. The number depends on whether the reactor is instantiated as a bank, whether any of its containers is instantiated as a bank, and whether it's top-level container is in the federate. The latter condition is particularly subtle because if the top-level container is a bank, then the width of the bank is irrelevant and should be treated as 1 because only one bank member will be in the federate.

I presume that the latter subtlety will go away if you are code generating a .lf file for each federate because then no top-level container will be a bank, right? Also, the check for whether the ReactorInstance is in the federate can probably be eliminated because this will not be called on any reactor that is not in the federate. If I'm right about these two, then you can probably replace currentFederate.numRuntimeInstances(reactor) with reactor.getTotalWidth(0). In fact, you could update ReactorInstance.getTotalWidth() to not take a depth argument because I suspect it will only ever be called with depth == 0.

edwardalee avatar Jul 24 '22 13:07 edwardalee

[...] no top-level container will be a bank, right?

Right. At least that's what I'm aiming for.

In fact, you could update ReactorInstance.getTotalWidth() to not take a depth argument because I suspect it will only ever be called with depth == 0.

Thank you! I applied your solution in c75d02a. Luckily, there is already a version of getTotalWidth that takes not argument (and passes 0 as the depth to ReactorInstance.getTotalWidth(int depth)).

Soroosh129 avatar Jul 24 '22 19:07 Soroosh129

Notes to myself: Cleaning logic is here https://github.com/lf-lang/lingua-franca/blob/f1c4e65912cfdd103ca4b3fdf7f9c1aa509e54b0/org.lflang/src/org/lflang/generator/GeneratorBase.java#L383; I tried to fix --clean by simply overriding FedFileConfig::doClean(), but it appears that cleaning logic is using a FileConfig even if --federated is passed in (idk if this is expected, but should be?) (see https://github.com/lf-lang/lingua-franca/commit/bebc935bf5ba2e439c494622f29c61d066b3d643) Gotta find some more elegant way to fix later today

image

EDIT: Add them here: https://github.com/lf-lang/lingua-franca/blob/738367490bacb1360287acaf464bac4d3884129b/org.lflang/src/org/lflang/federated/generator/FedGenerator.java#L122

axmmisaka avatar Oct 25 '22 12:10 axmmisaka

@oowekyala and @hokeun I merged changes coming mostly from you two in this commit: https://github.com/lf-lang/lingua-franca/pull/1221/commits/99f1657b2e611d43afc5c298f7c749a4212d7fdc

Please have a look at it to make sure I didn't botch anything... I don't have a lot of confidence in this merge.

lhstrh avatar Nov 08 '22 23:11 lhstrh

@oowekyala and @hokeun I merged changes coming mostly from you two in this commit: 99f1657

Please have a look at it to make sure I didn't botch anything... I don't have a lot of confidence in this merge.

Thank you, @lhstrh, for your efforts. Is defaultFederateConfig still an issue? I should be able to take another look this weekend.

hokeun avatar Nov 09 '22 08:11 hokeun

defaultFederateConfig is still an issue. It looks like the code for generating the definition in the preamble is still present, but the preamble itself is missing from the generated code. Could potentially be an easy fix, but I don't know the code well enough to quickly find it...

lhstrh avatar Nov 09 '22 08:11 lhstrh

The problem is that the preamble that contains the defaultFederateConfig is generated as a top-level preamble:

// c.lf
...
preamble {=
    const defaultFederateConfig: __FederateConfig = {
        ...
    }
=}


@_fed_config(network_message_actions="")
main reactor  {
    c = new Count()

    @_fed_send
    reaction(c.out) {=
        if (c.out !== undefined) {
            this.util.sendRTITimedMessage(c.out, 1, 0);
        }
    =}
}

this is done here: https://github.com/lf-lang/lingua-franca/blob/71ef8d0701c6a9283905f42eb7736621d0424dae/org.lflang/src/org/lflang/federated/generator/FedPreambleEmitter.java#L41

which is target-independent code. However the TS generator only outputs preambles that are contained in the reactor (so not top-level ones): https://github.com/lf-lang/lingua-franca/blob/71ef8d0701c6a9283905f42eb7736621d0424dae/org.lflang/src/org/lflang/generator/ts/TSReactorGenerator.kt#L147

I'm not sure what the correct behavior is here. Is the scope of a top-level preamble defined in TS? Where should it be generated?

oowekyala avatar Nov 09 '22 14:11 oowekyala

@oowekyala thanks for the pointers! The fix is in 46f0dac.

lhstrh avatar Nov 09 '22 21:11 lhstrh

I am having trouble with generated delay reactors in Python. Maybe @cmnrd knows more?

For context:

This still generates wrong code. For example, AfterNoTypes fails with a
segfault because the body of _lf_gendelay_0reaction_function_0
(everything after `cinit`, which ends around line 256) is wrong. But I
am having trouble tracing through the code generator to find out why.

petervdonovan avatar Jan 12 '23 08:01 petervdonovan

I am having trouble with generated delay reactors in Python. Maybe @cmnrd knows more?

No, sorry. In https://github.com/lf-lang/lingua-franca/pull/1508, I only moved code around, but did not touch the internals of the delay generation in the targets. Did something change in the way the delay bodies are generated in Python it this branch? I could imagine that if the delay bodies changed in this branch, merging with master simply deleted theses updated functions and added the old ones in the new place.

cmnrd avatar Jan 12 '23 10:01 cmnrd

I am having trouble with generated delay reactors in Python. Maybe @cmnrd knows more?

No, sorry. In https://github.com/lf-lang/lingua-franca/pull/1508, I only moved code around, but did not touch the internals of the delay generation in the targets. Did something change in the way the delay bodies are generated in Python it this branch? I could imagine that if the delay bodies changed in this branch, merging with master simply deleted theses updated functions and added the old ones in the new place.

We should inspect the merge closely. Probably threw away chaness from fed-gen.

lhstrh avatar Jan 12 '23 15:01 lhstrh

It looks like the C (and maybe CCpp) test failures that we are seeing now -- in particular, the STP violations and determinism violations -- existed even before we merged fed-gen-wip. I think they are the same as those mentioned in my previous comment in the fed-gen-wip PR. I will probably have to do a binary search through the commit history.

petervdonovan avatar Jan 12 '23 23:01 petervdonovan

According to the core dump, the FloatingPointException which appears in DistributedPhysicalActionUpstreamLong (and a few other failing tests, I think) is thrown from this line, which is used only in the adaptive scheduler.

I don't know why we are using that scheduler in the first place, but anyway -- it means that we're trying to schedule a reaction at a level that the adaptive scheduler thinks should never have any reactions scheduled on it. Alternatively, we might be trying to schedule a reaction at a level that is outside the bounds of the num_workers_by_level array. In any case, num_workers_by_level[level] is zero, which should never happen.

petervdonovan avatar Jan 15 '23 01:01 petervdonovan

It looks like we have a deadline violation in the enclave tests. Do you know how we should handle this, @cmnrd? It seems unlikely that this issue resulted from the changes in this branch, but I'm not sure whether it is a real concurrency bug or just a result of the timing variability in CI.

petervdonovan avatar Jan 15 '23 19:01 petervdonovan

This error is new:

    pnpm: Command failed with exit code 128: git ls-remote --refs git+ssh://[email protected]/lf-lang/reactor-ts.git fed-gen
    Host key verification failed.
    fatal: Could not read from remote repository.

I do not think that we introduced changes that would cause this. Maybe it is just a transient error.

petervdonovan avatar Jan 15 '23 23:01 petervdonovan

This error is new:

    pnpm: Command failed with exit code 128: git ls-remote --refs git+ssh://[email protected]/lf-lang/reactor-ts.git fed-gen
    Host key verification failed.
    fatal: Could not read from remote repository.

I do not think that we introduced changes that would cause this. Maybe it is just a transient error.

It is transient. I have seen it before...

lhstrh avatar Jan 15 '23 23:01 lhstrh

For the record, the newly failing CppRos2Tests probably should have been failing previously. It looks like the fix in the tests by @lhstrh exposed another bug in the tests, where the AST transformation that was used to set the ros2 target property was not having any effect.

petervdonovan avatar Jan 16 '23 01:01 petervdonovan

It looks like we have a deadline violation in the enclave tests. Do you know how we should handle this, @cmnrd? It seems unlikely that this issue resulted from the changes in this branch, but I'm not sure whether it is a real concurrency bug or just a result of the timing variability in CI.

It can be ignored here. I will try to further relax the timings on master. Apparently CI on MacOS is particularly unpredictable...

cmnrd avatar Jan 16 '23 09:01 cmnrd

I just noticed that there are some more weird issues with the Docker support. It seems that docker files are now generated for all federated programs, no matter if they have docker support enabled or not. Also, even containerized federated programs produce a then useless launch script.

cmnrd avatar Jan 18 '23 17:01 cmnrd

Error code 139 observed in this CI run suggests a segfault due to wrong format specifiers in a printf. I have seen a bunch of warnings in the C compilation output of both federates and the RTI, but I'm not sure if they could be the source of the problem. Tagging folks who might be able to help: @erlingrj, @edwardalee, @petervdonovan.

lhstrh avatar Jan 18 '23 23:01 lhstrh

I notice that if I run the same federated containerized test in rapid succession, then after a few tries I inevitably run into this:

DistributedCountContainerized-rti  | WARNING: RTI failed to accept the socket. Resource temporarily unavailable. Trying again.
DistributedCountContainerized-p    | DEBUG: _lf_create_token: Allocated memory for token: 0x7f7b570ae930
DistributedCountContainerized-p    | Federate 1: LOG: Connecting to the RTI.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15045. Trying 15046.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15046. Trying 15047.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15047. Trying 15048.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15048. Trying 15049.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15049. Trying 15050.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-p    | Federate 1: Failed to connect to RTI on port 15050. Trying 15051.
DistributedCountContainerized-p    | Federate 1: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-rti  | WARNING: RTI failed to accept the socket. Resource temporarily unavailable. Trying again.

This log suggests to me that the problem is in the RTI, not with the federates. This brings me to another question: which version of the RTI is being used here? The RTI doesn't have any version numbers, so I guess nobody can answer this question, but where does the image come from? The docker-compose.yml file has an entry image: lflang/rti:rti which suggests that an image has been published somewhere. By whom? How do we update it? @housengw do you happen to know anything about this?

lhstrh avatar Jan 19 '23 00:01 lhstrh

OK, it looks like the docker image was published by @housengw: https://hub.docker.com/u/housengw, ten months ago.

lhstrh avatar Jan 19 '23 00:01 lhstrh

Error code 139 observed in this CI run suggests a segfault due to wrong format specifiers in a printf. I have seen a bunch of warnings in the C compilation output of both federates and the RTI, but I'm not sure if they could be the source of the problem. Tagging folks who might be able to help: @erlingrj, @edwardalee, @petervdonovan.

I put in quite a bit of effort to purge warnings, but never got to the RTI or federated code. Help would be welcome.

In lf_tag_64_32.h it says:

// Define PRINTF_TIME and PRINTF_MICROSTEP, which are the printf
// codes (like the d in %d to print an int) for time and microsteps.
// To use these, specify the printf as follows:
//     printf("Time: " PRINTF_TIME "\n", time_value);
// On most platforms, time is an signed 64-bit number (int64_t) and
// the microstep is an unsigned 32-bit number (uint32_t).
// Sadly, in C, there is no portable to print such numbers using
// printf without getting a warning on some platforms.
// On each platform, the code for printf if given by the macros
// PRId64 and PRIu32 defined in inttypes.h.  Hence, here, we import
// inttypes.h, then define PRINTF_TIME and PRINTF_MICROSTEP.
// If you are targeting a platform that uses some other type
// for time and microsteps, you can simply define
// PRINTF_TIME and PRINTF_MICROSTEP directly in the same file that
// defines the types _instant_t, _interval_t, and _microstep_t.
#include <inttypes.h>
#define PRINTF_TIME "%" PRId64
#define PRINTF_MICROSTEP "%" PRIu32

// For convenience, the following string can be inserted in a printf
// format for printing both time and microstep as follows:
//     printf("Tag is " PRINTF_TAG "\n", time_value, microstep);
#define PRINTF_TAG "(" PRINTF_TIME ", " PRINTF_MICROSTEP ")"

edwardalee avatar Jan 19 '23 10:01 edwardalee