ockam
ockam copied to clipboard
fix(rust): do not skip starting services when node starts
Current Behavior
- Issue #3234 found that when the
ockam
CLI is used to create, stop and then start a node, that node no longer responds to messages sent to itsuppercase
service. -
ockam_api::NodeManager
only starts a node's default services ifskip_defaults
isfalse
. When starting a node that has been stopped,skip_defaults
istrue
(added in PR #3024).
Proposed Changes
This assumes it is okay for skip_defaults
to be ignored in this case.
Is it okay? (@SanjoDeundiak, @mrinalwadhwa)
- Remove
skip_defaults
check, soNodeManager
always starts a node's default services. - Create test case in
ockam_command/tests/commands.bats
that fails without this PR. - Because on CI machines killed processes sometimes do not disappear but are marked
<defunct>
, modify theStopCommand
handler so that a stopped node'spid
is removed from the config file, and hence theStartCommand
handler does not have to rely onkill 0 pid
to test if a process is alive. This was causing https://github.com/build-trust/ockam/issues/3234#issuecomment-1217868771. - Fixes #3234.
Checks
- [x] All commits in this Pull Request are signed and Verified by Github.
- [x] All commits in this Pull Request follow the Ockam commit message convention.
- [x] I accept the Ockam Community Code of Conduct.
- [x] I have accepted the Ockam Contributor License Agreement by adding my Git/Github details in a row at the end of the CONTRIBUTORS.csv file in a separate pull request to the build-trust/ockam-contributors repository.
Hey @neil2468 . I think skip_defaults
allows us to create a fresh node, so that later with following commands you could manually set it up to the desired state (create vault, identity, start services, etc.). Let's not remove it.
Also, the fact that its value is true while node is starting is (I think) desired behaviour. The problem I see here, is that node should persist its state (e.g. what services at which addresses were started), so that it could correctly recreate it during node start. This is what had been missing for some time, latest attempt to solve that is I think StartupConfig
(you can find it in the code). Not sure we'll stick to that approach in the future, but looks like if you want to fix that particular issue with starting services, StartupConfig
is the place to make changes. Thoughts?
P.S. The other change in this PR looks great, would be nice to merge it separately!
Thanks @SanjoDeundiak , I'll take a look at StartupConfig
.
To me, it looks like startup.json
is created and read by ockam_command
, but the node’s default services are created within ockam_api
(which does not use startup.json
).
I would like the starting of services to be next to the code that writes their details to startup.json
. So, should we (1) move the starting of default services to ockam_command
, or (2) have ockam_api
write to startup.json
, or (3) something else? I would vote for (1).
I also noted that it seems only ockam_command/src/tcp/…
is using startup.json
, and when I tried to restart a node with a tcp-listener I got an error (see below). Will startup.json
be used more in the future?
> ockam node create n1
> ockam tcp-listener create 127.0.0.1:7000 --at /node/n1
> ockam node stop n1
> ockam node start n1
Attempting to restart node 'n1'
Running: start node ...ok!
error: Found argument 'transport' which wasn't expected, or isn't valid in this context
USAGE:
ockam [OPTIONS] <SUBCOMMAND>
For more information try --help
Running: create transport ...ok
Thanks in advance!
@neil2468 nice analysis! startup.json was really immature piece of functionality. I think ockam_command
is more about handling CLI commands and communicating with nodes. ockam_api
is where actual functionality and state of the nodes should live. So I think it's 2
. Node should remember which services it started and be able to get everything going again during node start. Unless you think there are problems with this approach