ockam icon indicating copy to clipboard operation
ockam copied to clipboard

fix(rust): do not skip starting services when node starts

Open neil2468 opened this issue 2 years ago • 4 comments

Current Behavior

  • Issue #3234 found that when the ockam CLI is used to create, stop and then start a node, that node no longer responds to messages sent to its uppercase service.
  • ockam_api::NodeManager only starts a node's default services if skip_defaults is false. When starting a node that has been stopped, skip_defaults is true (added in PR #3024).

https://github.com/build-trust/ockam/blob/cbf0573a5bbaa1eb7a89486c235f877e49568891/implementations/rust/ockam/ockam_api/src/nodes/service.rs#L492-L498

https://github.com/build-trust/ockam/blob/cbf0573a5bbaa1eb7a89486c235f877e49568891/implementations/rust/ockam/ockam_api/src/nodes/service.rs#L255-L294

Proposed Changes

This assumes it is okay for skip_defaults to be ignored in this case. Is it okay? (@SanjoDeundiak, @mrinalwadhwa)

  • Remove skip_defaults check, so NodeManager always starts a node's default services.
  • Create test case in ockam_command/tests/commands.bats that fails without this PR.
  • Because on CI machines killed processes sometimes do not disappear but are marked <defunct>, modify the StopCommand handler so that a stopped node's pid is removed from the config file, and hence the StartCommand handler does not have to rely on kill 0 pid to test if a process is alive. This was causing https://github.com/build-trust/ockam/issues/3234#issuecomment-1217868771.
  • Fixes #3234.

Checks

neil2468 avatar Sep 10 '22 15:09 neil2468

Hey @neil2468 . I think skip_defaults allows us to create a fresh node, so that later with following commands you could manually set it up to the desired state (create vault, identity, start services, etc.). Let's not remove it. Also, the fact that its value is true while node is starting is (I think) desired behaviour. The problem I see here, is that node should persist its state (e.g. what services at which addresses were started), so that it could correctly recreate it during node start. This is what had been missing for some time, latest attempt to solve that is I think StartupConfig (you can find it in the code). Not sure we'll stick to that approach in the future, but looks like if you want to fix that particular issue with starting services, StartupConfig is the place to make changes. Thoughts? P.S. The other change in this PR looks great, would be nice to merge it separately!

SanjoDeundiak avatar Sep 12 '22 17:09 SanjoDeundiak

Thanks @SanjoDeundiak , I'll take a look at StartupConfig.

neil2468 avatar Sep 12 '22 17:09 neil2468

To me, it looks like startup.json is created and read by ockam_command, but the node’s default services are created within ockam_api (which does not use startup.json).

I would like the starting of services to be next to the code that writes their details to startup.json. So, should we (1) move the starting of default services to ockam_command, or (2) have ockam_api write to startup.json, or (3) something else? I would vote for (1).

I also noted that it seems only ockam_command/src/tcp/… is using startup.json, and when I tried to restart a node with a tcp-listener I got an error (see below). Will startup.json be used more in the future?

> ockam node create n1
> ockam tcp-listener create 127.0.0.1:7000 --at /node/n1
> ockam node stop n1
> ockam node start n1

Attempting to restart node 'n1'
Running: start node ...ok!
error: Found argument 'transport' which wasn't expected, or isn't valid in this context

USAGE:
    ockam [OPTIONS] <SUBCOMMAND>

For more information try --help
Running: create transport ...ok

Thanks in advance!

neil2468 avatar Sep 14 '22 16:09 neil2468

@neil2468 nice analysis! startup.json was really immature piece of functionality. I think ockam_command is more about handling CLI commands and communicating with nodes. ockam_api is where actual functionality and state of the nodes should live. So I think it's 2. Node should remember which services it started and be able to get everything going again during node start. Unless you think there are problems with this approach

SanjoDeundiak avatar Sep 16 '22 11:09 SanjoDeundiak