skywire-testnet icon indicating copy to clipboard operation
skywire-testnet copied to clipboard

After node restart, router receives packet but fails to foward to app.

Open evanlinjin opened this issue 6 years ago • 7 comments

Given a situation where we have 3 nodes: A, B and C. In which;

  • A has a transport with B.
  • B has a transport with C.
  • A has a route to C via B as intermediary.

When B (the intermediary node) restarts;

  • A and C successfully re-establish transports to B.
  • A and C successfully re-establish their route to one another.
  • A can send and receive packets to and from C.

However;

  • A can no longer deliver the right packet to app of the closed route.
  • C can no longer deliver the right packet to app of the closed route.

evanlinjin avatar May 10 '19 04:05 evanlinjin

Logs of A before restart of B:

[2019-05-10T16:59:48+12:00] INFO [chat.v1.0]: 2019/05/10 16:59:48 received and trashed: {"message":"Hello Joe!","sender":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055"}
[2019-05-10T16:59:48+12:00] INFO [router]: Forwarded packet to App on Port 1
[2019-05-10T16:59:48+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":1,"remote":{"pk":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055","port":11}},"payload":"SGVsbG8gTWlrZSE="}
[2019-05-10T16:59:48+12:00] INFO [router]: Forwarded App packet from LocalPort 1 using route ID 4
[2019-05-10T16:59:48+12:00] INFO [router]: Got new remote packet with route ID 6. Using rule: App: <resp-rid: 5><remote-pk: 031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055><remote-port: 11><local-port: 1>
[2019-05-10T16:59:48+12:00] INFO [chat.v1.0]: 2019/05/10 16:59:48 received and trashed: {"message":"Hello Joe!","sender":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055"}
[2019-05-10T16:59:48+12:00] INFO [router]: Forwarded packet to App on Port 1
[2019-05-10T16:59:48+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":1,"remote":{"pk":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055","port":11}},"payload":"SGVsbG8gTWlrZSE="}
[2019-05-10T16:59:48+12:00] INFO [router]: Forwarded App packet from LocalPort 1 using route ID 4
[2019-05-10T16:59:48+12:00] INFO [router]: Got new remote packet with route ID 6. Using rule: App: <resp-rid: 5><remote-pk: 031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055><remote-port: 11><local-port: 1>
[2019-05-10T16:59:48+12:00] INFO [chat.v1.0]: 2019/05/10 16:59:48 received and trashed: {"message":"Hello Joe!","sender":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055"}
[2019-05-10T16:59:48+12:00] INFO [router]: Forwarded packet to App on Port 1

Logs of B before restart of B:

[2019-05-10T16:58:44+12:00] INFO [router]: Forwarded packet via Transport 236e33a2-8bc1-5b1d-b168-593931d16111 using rule 6
[2019-05-10T16:58:44+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: Forward: <next-rid: 4><next-tid: 38254102-9309-5c24-85c0-d3098eb81256>
[2019-05-10T16:58:44+12:00] INFO [router]: Forwarded packet via Transport 38254102-9309-5c24-85c0-d3098eb81256 using rule 4
[2019-05-10T16:58:44+12:00] INFO [router]: Got new remote packet with route ID 5. Using rule: Forward: <next-rid: 6><next-tid: 236e33a2-8bc1-5b1d-b168-593931d16111>
[2019-05-10T16:58:44+12:00] INFO [router]: Forwarded packet via Transport 236e33a2-8bc1-5b1d-b168-593931d16111 using rule 6
[2019-05-10T16:58:44+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: Forward: <next-rid: 4><next-tid: 38254102-9309-5c24-85c0-d3098eb81256>
[2019-05-10T16:58:44+12:00] INFO [router]: Forwarded packet via Transport 38254102-9309-5c24-85c0-d3098eb81256 using rule 4

Logs of C before restart of B:

[2019-05-10T16:58:06+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T16:58:06+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T16:58:06+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: App: <resp-rid: 5><remote-pk: 024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7><remote-port: 1><local-port: 11>
[2019-05-10T16:58:06+12:00] INFO [chat.v1.0]: 2019/05/10 16:58:06 received and trashed: {"message":"Hello Mike!","sender":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7"}
[2019-05-10T16:58:06+12:00] INFO [router]: Forwarded packet to App on Port 11
[2019-05-10T16:58:06+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T16:58:06+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T16:58:06+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: App: <resp-rid: 5><remote-pk: 024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7><remote-port: 1><local-port: 11>
[2019-05-10T16:58:06+12:00] INFO [chat.v1.0]: 2019/05/10 16:58:06 received and trashed: {"message":"Hello Mike!","sender":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7"}

evanlinjin avatar May 10 '19 05:05 evanlinjin

Logs of B when restarting B:

[2019-05-10T17:00:38+12:00] WARN [trmanager]: Failed to change transport status: status: 401, error: {"error":{"message":"SW-Nonce does not match","code":401}}

Hence, this issue may be related to #345

evanlinjin avatar May 10 '19 05:05 evanlinjin

Logs of A after restart of B:

[2019-05-10T17:03:10+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":1,"remote":{"pk":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055","port":11}},"payload":"SGVsbG8gTWlrZSE="}
[2019-05-10T17:03:10+12:00] INFO [router]: Forwarded App packet from LocalPort 1 using route ID 4
[2019-05-10T17:03:10+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":1,"remote":{"pk":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055","port":11}},"payload":"SGVsbG8gTWlrZSE="}
[2019-05-10T17:03:10+12:00] INFO [router]: Forwarded App packet from LocalPort 1 using route ID 4
[2019-05-10T17:03:10+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":1,"remote":{"pk":"031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055","port":11}},"payload":"SGVsbG8gTWlrZSE="}
[2019-05-10T17:03:10+12:00] INFO [router]: Forwarded App packet from LocalPort 1 using route ID 4

Logs of B after restart of B:

[2019-05-10T17:04:36+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: Forward: <next-rid: 4><next-tid: 38254102-9309-5c24-85c0-d3098eb81256>
[2019-05-10T17:04:36+12:00] INFO [router]: Forwarded packet via Transport 38254102-9309-5c24-85c0-d3098eb81256 using rule 4
[2019-05-10T17:04:37+12:00] INFO [router]: Got new remote packet with route ID 5. Using rule: Forward: <next-rid: 6><next-tid: 236e33a2-8bc1-5b1d-b168-593931d16111>
[2019-05-10T17:04:37+12:00] INFO [router]: Forwarded packet via Transport 236e33a2-8bc1-5b1d-b168-593931d16111 using rule 6
[2019-05-10T17:04:37+12:00] INFO [router]: Got new remote packet with route ID 4. Using rule: Forward: <next-rid: 4><next-tid: 38254102-9309-5c24-85c0-d3098eb81256>
[2019-05-10T17:04:37+12:00] INFO [router]: Forwarded packet via Transport 38254102-9309-5c24-85c0-d3098eb81256 using rule 4

Logs of C after restart of B:

[2019-05-10T17:05:13+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T17:05:13+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T17:05:13+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T17:05:13+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T17:05:13+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T17:05:13+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T17:05:14+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T17:05:14+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5
[2019-05-10T17:05:14+12:00] INFO [router]: Got new App request with type Send: {"addr":{"port":11,"remote":{"pk":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7","port":1}},"payload":"SGVsbG8gSm9lIQ=="}
[2019-05-10T17:05:14+12:00] INFO [router]: Forwarded App packet from LocalPort 11 using route ID 5

evanlinjin avatar May 10 '19 05:05 evanlinjin

I hope my above comments helps with understanding the bug.

evanlinjin avatar May 10 '19 05:05 evanlinjin

I see that these logs belong to #336, however I have not been able to reproduce such error, logs after B restarts are the same than before on every node, chat app keeps working and I can see how both A and C deliver the packet to their apps.

Have you re-tested this before your last commit?. Was it happening on a 100% chance?.

ivcosla avatar May 10 '19 15:05 ivcosla

Hey @ivcosla, you may need to try restart node B multiple times. And note that this is no error, but “unexpected behavior” haha.

[2019-05-10T16:58:06+12:00] INFO [chat.v1.0]: 2019/05/10 16:58:06 received and trashed: {"message":"Hello Mike!","sender":"024ec47420176680816e0406250e7156465e4531f5b26057c9f6297bb0303558c7"}

When the above log message disappears, we know that chat app no longer receives the packets.

evanlinjin avatar May 13 '19 00:05 evanlinjin

I see a chacha error after the restart of B that causes this behavior, it's triggered by it when it reconnects to A and C.

#345 doesn't seem to be related to this error. Further checks that I want to do are on B startup as well as in route creation (check if the error exists in this branch https://github.com/skycoin/skywire/pull/344).

ivcosla avatar May 13 '19 21:05 ivcosla