skupper icon indicating copy to clipboard operation
skupper copied to clipboard

Skupper on ARM keeps restarting

Open michaelalang opened this issue 2 years ago • 6 comments

Skupper 1.5.0 running on a raspberry PI 4 keep restarting with following logs

2023-11-13 16:54:38.811339 +0000 SERVER (error) [C2884] Connection from ::1:53398 (to localhost:5672) failed: amqp:connection:framing-error connection aborted
2023-11-13 16:54:38.812755 +0000 SERVER (error) [C2891] Connection from ::1:53470 (to localhost:5672) failed: amqp:connection:framing-error connection aborted
2023-11-13 16:54:38.814129 +0000 SERVER (error) [C2883] Connection from ::1:53396 (to localhost:5672) failed: amqp:connection:framing-error connection aborted

and

2023-11-13 16:55:32.057049 +0000 SERVER (error) [C2880] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:32.227544 +0000 FLOW_LOG (info) LOG [8bkk6:2769] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2880] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084
2023-11-13 16:55:34.092963 +0000 SERVER (error) [C2881] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:34.235120 +0000 FLOW_LOG (info) LOG [8bkk6:2770] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2881] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084
2023-11-13 16:55:38.227183 +0000 SERVER (error) [C2882] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:38.246242 +0000 FLOW_LOG (info) LOG [8bkk6:2771] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2882] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084

the deployment is working for ~1-4 minutes and even shows remotes and exposed services as well as access to various services accordingly. After that period of time, it seems that SSL get's out-of-sync (maybe due to hardware limitation?) and the pods get restarted, and the same behavior is reproduced (works for 1-4min than doesn't work)

I understand we do not support skupper on ARM in that relation at the moment, still I want to make everyone aware of the possible issue we might face with ARM based deployments.

michaelalang avatar Nov 13 '23 17:11 michaelalang

here's another error I was able to capture in the service-controller/flow-collector pod

[Beacon detector module starting]
[API module starting]
API server listening on port 8010
Connection to the VAN is open
New ROUTER detected: zhg6s:0
New ROUTER detected: hbpjt:0
New ROUTER detected: qxdth:0
New CONTROLLER detected: cfa7a05c-d9bc-464c-a485-819add8f4a76
Sending FLUSH to sfe.zhg6s:0
Sending FLUSH to sfe.hbpjt:0
New CONTROLLER detected: 6e9774e6-02ff-42c6-8f85-9a63d0734605
New CONTROLLER detected: cb1c35c7-8d48-4eed-9b25-2d07f1ec15b3
New ROUTER detected: qgnsz:0
Sending FLUSH to sfe.qxdth:0
New CONTROLLER detected: 62737c3d-13d4-4c09-82bf-449625b5eeaf
New CONTROLLER detected: af10fd96-bce9-4fb7-8585-87f60810ff9e
New ROUTER detected: rg2dg:0
New ROUTER detected: 8bkk6:0
Sending FLUSH to sfe.cfa7a05c-d9bc-464c-a485-819add8f4a76
events.js:174
      throw er; // Unhandled 'error' event
      ^

TypeError: Cannot read property 'push' of undefined
    at new Record (/usr/src/src/data.js:122:32)
    at Object.exports.IncomingRecord (/usr/src/src/data.js:503:23)
    at recordList.forEach.item (/usr/src/src/network.js:123:18)
    at Array.forEach (<anonymous>)
    at Container.<anonymous> (/usr/src/src/network.js:121:20)
    at Container.emit (events.js:198:13)
    at Container.dispatch (/usr/src/node_modules/rhea/lib/container.js:41:33)
    at Connection.dispatch (/usr/src/node_modules/rhea/lib/connection.js:261:40)
    at Session.dispatch (/usr/src/node_modules/rhea/lib/session.js:456:41)
    at Receiver.link.dispatch (/usr/src/node_modules/rhea/lib/link.js:62:38)
Emitted 'error' event at:
    at Container.dispatch (/usr/src/node_modules/rhea/lib/container.js:41:33)
    at Connection.dispatch (/usr/src/node_modules/rhea/lib/connection.js:261:40)
    at Connection.input (/usr/src/node_modules/rhea/lib/connection.js:574:18)
    at TLSSocket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
    at TLSSocket.Readable.push (_stream_readable.js:224:10)
    at TLSWrap.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)

michaelalang avatar Nov 17 '23 09:11 michaelalang

What image is that log from? (It is a node.js based image which is not the standard flow controller).

grs avatar Nov 17 '23 10:11 grs

@grs it's based on https://github.com/skupperproject/skupper/blob/main/Dockerfile.flow-collector

michaelalang avatar Nov 17 '23 11:11 michaelalang

@grs it's based on https://github.com/skupperproject/skupper/blob/main/Dockerfile.flow-collector

I don't think it can be as that is a go based collector and the trace is clearly from a nodejs based program.

grs avatar Nov 17 '23 13:11 grs

For the record, that backtrace is from the prototype collector (nodejs). Can you run skupper version in that environment to see what images are being used?

ted-ross avatar Nov 21 '23 16:11 ted-ross

Hi Ted,

I picked the dockerfiles from the repo ... :?

$ skupper -c pi4 -n skupper version
client version                 1.4.1
transport version              quay.example.com/skupper/skupper-router:2.5.0 (sha256:51f8ab009232)
controller version             not-found
config-sync version            quay.example.com/skupper/config-sync:1.5.0 (sha256:e60cfee4c09a)
flow-collector version         not-found

$ oc --context pi4 -n skupper exec -ti deploy/skupper-service-controller -- ./service-controller -version
1.5.0

[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skrouterd -v
0.0.0
[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skmanage --version
0.0.0
[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skstat --version
0.0.0

[root@pi4 skupper-router]# git config remote.origin.url
https://github.com/skupperproject/skupper-router
[root@pi4 skupper-router]# git branch
* main
# Containerfile used for build

[root@pi4 skupper]# git config remote.origin.url
https://github.com/skupperproject/skupper.git
[root@pi4 skupper]# git branch
* (HEAD detached at 1.5.0)
  main

# Dockerfile.ci-test  Dockerfile.config-sync  Dockerfile.controller-podman  Dockerfile.flow-collector  Dockerfile.service-controller  Dockerfile.site-controller used for build 

michaelalang avatar Nov 21 '23 17:11 michaelalang