status-desktop Permission created before making device a control node is not getting active

trafficstars

Description

recover account that is an owner of any community
enable test nets
enable managing communities on test nets
open community - permssions - create permission
note that permission is created with status Pending
make the device a control node
restart the app

As result, the permission is never active even the device is now a control node

Mar 19 '24 08:03 anastasiyaig

@jrainville

Mar 19 '24 08:03 anastasiyaig

Hmm, this is a tough scenario.

What happens here is that an Owner that is not the control node is basically the same as a TokenMaster. They send their changes as events to the control node and then the control node processes the events and re-publishes the community with the changes.

However, in you case, the control node was probably lost, thus why to re-imported. In that case, the correct steps would have been to make that new device the control node before doing any action, since there was no control node.

We have 2 options here:

Improve the UX with some warnings that makes it clear that even though you are the owner, you are not the control node, so any change needs a control node.
When an owner is promoted to control node, if there are pending events by ourselves, we need to apply them
- I'm implying that pending events not from ourselves would be processed, because otherwise that is a bigger issue.

Both of those options are not mutually exclusive, but option 2 needs more work obviously.

@mprakhov @osmaczko can you confirm if the following scenario should work already: control node is dead, an admin sends a community change, it becomes pending. Then the owner re-imports and promotes itself to control node. Does the community change by the admin get processed? Even if it was received while the owner wasn't a control node

Mar 19 '24 17:03 jrainville

@mprakhov @osmaczko can you confirm if the following scenario should work already: control node is dead, an admin sends a community change, it becomes pending. Then the owner re-imports and promotes itself to control node.

I believe it won't work without next event is delivered: control node is dead -> admin sends event -> owner re-imports -> owner promotes self to control node (no action) -> admin sends next event (control node applies all past events).

I believe we should force events processing once we promote ourself to control node, it should be rather straightforward :thinking:

Mar 19 '24 19:03 osmaczko

I believe we should force events processing once we promote ourself to control node, it should be rather straightforward 🤔

Yes I agree. Then that,s the way forward to fix this issue (it's my number 2 point).

I'll schedule the fix for 2.29 as it's not a blocker

Mar 19 '24 19:03 jrainville

On master, attempt to reproduce the problem :

I followed steps 1 - 6, which ended in a crash. Trying to start the app in Step 7. crashes constantly after setting the device as control node in Step 6.

INF 2024-03-20 23:57:04.134-07:00 starting application...                    topics="status-app" tid=659651 file=nim_status_client.nim:205
WRN 2024-03-20 23:57:04.134-07:00 Error decoding signal                      topics="signals-manager" tid=659651 file=signals_manager.nim:49 err="Unknown signal received: mediaserver.started"
DBG 2024-03-20 23:57:08.253-07:00 primary_action                             topics="app-controller" tid=659651 file=module.nim:207 currFlow=AppLogin currState=LoginKeycardEnterPassword
ERROR[03-20|23:57:08.562|github.com/status-im/status-go/api/geth_backend.go:430] failed to initialize wallet db           package=status-go/api.GethStatusBackend err="failed to set `journal_mode` pragma: file is not a database"
failed to set `journal_mode` pragma: file is not a database
ERR 2024-03-20 23:57:08.562-07:00 error:                                     topics="accounts-service" tid=659651 file=service.nim:656 procName=verifyDatabasePassword errDesription="failed to set `journal_mode` pragma: file is not a database"
DBG 2024-03-20 23:57:08.563-07:00 Account logged in                          topics="accounts-service" tid=659651 file=service.nim:676
WARN [03-20|23:57:09.152|github.com/status-im/status-go/api/geth_backend.go:716] fleet is not supported, overriding with default value package=status-go/api.GethStatusBackend fleet= defaultFleet=shards.test
WRN 2024-03-20 23:57:09.167-07:00 Error decoding signal                      topics="signals-manager" tid=659651 file=signals_manager.nim:49 err="Unknown signal received: mediaserver.started"
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
make: *** [Makefile:815: run-linux] Segmentation fault (core dumped)

Still facing the crashes, trying to unblock myself.

Mar 21 '24 07:03 kounkou

Taking a look now. I had to focus on priorities here and here From testing, I managed to reproduce the issue without any crashes.

Currently disambiguating the problem :

credit to Patryk for helping on this

It is clear that the after the control node has regained the control on a community after having lost access (lost datadir or computer lost, etc) we need to process all pending events. Here are some other points I need to answer to make sure the resolution is complete. Requests sent before the control node had lost access and for which the processing was unfinished, can be of 2 types:

The 1:1 requests like RequestToJoin. I need to make sure such request is backedup so that they can re-read after account re-import and control node access regained.
The Admin events on the other side are publicly stored in store nodes inside the community description so processing of events should continue from available community description.

Out of scope :

we use PFS (Perfect Forwarding Secrecy), it might not be possible to get access to 1:1 message once we reimport. I assume Request to join are 1:1 events between the requester and the control node, therefore request to join should be resent when the new control node is set. There might be exception about the Request to join if the event is backed up, where the event can be read after re-import.

Apr 16 '24 19:04 kounkou

Still facing some roadblocks on getting this issue fixed :

1- Integration testing is challenging because the manual testing requires to similate a loss of the control node by deleting the Status/data folder. This is hard to reproduce in an integration test. An idea was to simulate only the portion of the test where the device isn't control node then gain to control node. However, in the integration test this seems difficult because an admin needs to sign the events to be processed. And creating an admin user requires to go through the control node state first...?

2- Currently the update of the permission doesn't produce the expected result with current code. I might be missing something in the fix and understanding the logic is quite complex. So I am still debugging step by step

May 06 '24 07:05 kounkou

Actions from today's meeting

When dealing with Community without Owner token :

[x] The promoting to the control node should be blocked from the Status-go perspective if there's no owner token
- [x] Add a very simple integration test that covers that when no owner token, setting device to control node is forbidden
[x] The promoting to the control node button should be hidden/blocked in the UI #14537

When dealing with community with Owner token :

[x] We implemented an integration test that covers this scenario, and it's PASS2
- [x] Improve the test to check for the 2 token permissions (BECOME_TOKEN_OWNER, BECOME_ADMIN) presence after promoting to control node
[ ] I need to make a manual test that confirms in reality that the integration test reflects reality.

Thank you so much for your help @mprakhov !

May 08 '24 17:05 kounkou

status-desktop status-desktop copied to clipboard

Permission created before making device a control node is not getting active

Description

Out of scope :

status-desktop
status-desktop copied to clipboard