mq-container
mq-container copied to clipboard
gskcapicmd_64 100% cpu usage and freeze start mq container on AMD CPUs
Hello!
When my laptop is connected to a power source, the gskcapicmd_64 call lasts forever, and one thread uses 100% of the processor. This is fixed if the laptop is powered by a battery. I don't understand why and how this is related.
Reproduced on ROG Zephyrus G15 GA503 GA503QM-HN094 (AMD Ryzen 7 5800HS with Radeon). Env: Win 10 Pro 20H2 build 19042.1110 with all updates for 17/07/2021 Docker toolbox v19.03.1 is running on a virtual box 6.1.22 r144080 (Qt5.6.2) with the extension package 6.1.22 r144080.
Just run:
docker run \
--env LICENSE=accept \
--env MQ_QMGR_NAME=QM1 \
--publish 1414:1414 \
--publish 9443:9443 \
--detach \
ibmcom/mq
And the bug will happen.
https://github.com/ibm-messaging/mq-container/blob/4580cecf4973107dff184e8cbbcf9ac7f5b4e7df/internal/keystore/keystore.go#L192
The problem is solved if I provide all the processor cores for the linux virtual machine where docker is installed, or if I setup the paravirtualization interface for this machine as hyper-v.
In the first solution, the CPU load is 100% already by the java process in the mq container. But restarting the container has a chance to solve the problem (only a chance). The second solution looks more stable. So far, there have been no problems
I am not sure that any solution is stable.
Same problem here :( Im running the container on Ubuntu 20.04 LTS, AMD Ryzen 7 5800, Lenovo Legion 5 Pro.
I have a similar situation - 100% CPU load by gsk8capicmd_64:
my hardware/software set::
AMD Ryzen 7 5800H
Windows 10.0.19042.1237 with WSL 2 with core version 5.10.16
Docker 20.10.8 build 3967b7d
AMD Ryzen 7 5800H (Lenovo Legion 5) Fedora 36 (kernel 5.17.12-300.fc36.x86_64), Docker Desktop 4.9.0 (docker 20.10.16)
same issue: /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw JtM27EP7L2LH -stash
loads CPU 100% and takes from 6 to 40 minutes (randomly)
@arthurbarr could you escalate this? this is productivity killer for developers working on AMD Ryzen setups, I guess original issue coming from some GSKit8 bug
Hi @andreysaksonov , @mihmig , @jfmatheusg , @kalekhin .
Arthur has asked me to look into this. The gsk8capicmd is owned by a separate internal IBM team to IBM MQ but i can raise a support ticket with them to ask them to take a look. To help them diagnose the issue they are likely to want trace of the issue.
Please could you run the same commands as before that caused the 100% CPU issue with the -trace <file>
option. For example, taking @andreysaksonov 's command i would run: /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw JtM27EP7L2LH -stash -trace /tmp/trace.output
.
Please then send me the file generated; in the previous example i would need /tmp/trace.output
.
You can send the file by either attaching it to a comment here or directly via email to [email protected]
.
Could you also let me know what version of MQ you are using, this is best done via the dspmqver command and you can also tell me directly what version of GSKit you are using via dspmqver -p 65
In the meantime I'll get the ball rolling with GSKit and hopefully we can get to the bottom of this.
@parrobe
docker rm ibmmq && docker run -e LICENSE=accept -e DEBUG=true -e MQ_QMGR_NAME=QM1 -p 9443:9443 --name ibmmq icr.io/ibm-messaging/mq:latest
❯ docker exec -it ibmmq /bin/bash
bash-4.4$ ps auxwwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1001 49 0.1 0.0 35120 4284 pts/0 Ss 14:45 0:00 /bin/bash
1001 55 0.0 0.0 47616 3544 pts/0 R+ 14:45 0:00 \_ ps auxwwf
1001 1 0.5 0.1 1290456 14472 ? Ssl 14:45 0:00 runmqserver -nologruntime -dev
1001 43 0.0 0.0 34988 4068 ? S 14:45 0:00 /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash
1001 48 100 0.0 46816 11652 ? R 14:45 0:15 \_ /opt/mqm/gskit8/bin/gsk8capicmd_64 -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash
bash-4.4$ /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash -trace /tmp/zxDekBysHQ7S.trace
bash-4.4$ date
Tue Jun 7 14:46:05 UTC 2022
bash-4.4$ /bin/sh /opt/mqm/bin/runmqakm -keydb -create -type cms -db /run/runmqserver/tls/key.kdb -pw zxDekBysHQ7S -stash -trace /tmp/zxDekBysHQ7S.trace
CTGSK3036W The output file "/run/runmqserver/tls/key.kdb" already exists.
bash-4.4$ date
Tue Jun 7 14:46:22 UTC 2022
bash-4.4$ exit
exit
❯ docker cp ibmmq:/tmp/zxDekBysHQ7S.trace .
❯ docker exec -it ibmmq /bin/bash
bash-4.4$ dspmqver
Name: IBM MQ
Version: 9.2.5.0
Level: p925-L220207-CSU01-L220405.DE
BuildType: IKAP - (Production)
Platform: IBM MQ for Linux (x86-64 platform)
Mode: 64-bit
O/S: Linux 5.10.104-linuxkit
O/S Details: Red Hat Enterprise Linux 8.6 (Ootpa)
InstName: Installation1
InstDesc: IBM MQ V9.2.5.0 (Unzipped)
Primary: N/A
InstPath: /opt/mqm
DataPath: /mnt/mqm/data
MaxCmdLevel: 925
LicenseType: Developer
bash-4.4$ dspmqver -p 65
Name: IBM MQ
Version: 9.2.5.0
Level: p925-L220207-CSU01-L220405.DE
BuildType: IKAP - (Production)
Platform: IBM MQ for Linux (x86-64 platform)
Mode: 64-bit
O/S: Linux 5.10.104-linuxkit
O/S Details: Red Hat Enterprise Linux 8.6 (Ootpa)
InstName: Installation1
InstDesc: IBM MQ V9.2.5.0 (Unzipped)
Primary: N/A
InstPath: /opt/mqm
DataPath: /mnt/mqm/data
MaxCmdLevel: 925
LicenseType: Developer
AMQ8250I: The 32-bit GSKit component is not installed.
Name: IBM Global Security Kit for IBM MQ
Version: 8.0.55.26
BuildType: Production
Mode: 64-bit
bash-4.4$
As you can see drama of the situation is that when it is not spawned by runmqserver -nologruntime -dev
but instead I run it from new shell in container - the command does not hang. Attached trace file anyway
Thanks @andreysaksonov - I've passed these details onto the GSKit team. I will let you know when they have responded.
Hi @andreysaksonov - We've heard back from GSKit now. They suspect this is an issue they have seen with some AMD processors where their RNG module hangs due to a diference in the AMD chips clock. They have asked if we can retry with the following environment variable set as a workaround to see if the issue resolves:
Please set ICC_SHIFT=3
when creating your container so it is present for the container startup. Please run trace again if the issue has not resolved.
Yes, it solves the issue, thanks. I will leave link to original GSKit bug: https://www.ibm.com/support/pages/apar/IJ28497