Entire lockup of Guardian system, running locally
Problem description
Having started to run the Guardian yesterday locally by upgrading to the latest version 2.23.1, I have been using it throughout today to create a new client against.
Specifically, I have simply been working with the login, register, token, and session endpoints before I start building out more advanced flows.
After using the API of guardian throughout the day it froze up completely, previously, before this time, I was testing the register flows.
Step to reproduce
Reproducibility in this case will be tricky, but I have run a couple of commands, including:
docker compose down
docker compose -d up
Running docker ps yields this
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
98f4298c7ffb hedera-guardian-web-proxy "/docker-entrypoint.…" 13 minutes ago Up 13 minutes 0.0.0.0:3000->80/tcp hedera-guardian-web-proxy-1
878809b6661b hedera-guardian-api-gateway "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 3002/tcp, 6555/tcp hedera-guardian-api-gateway-1
bc5a50efae6a hedera-guardian-application-events "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 0.0.0.0:3012->3012/tcp hedera-guardian-application-events-1
7d1be65e7575 hedera-guardian-guardian-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 0.0.0.0:5007->5007/tcp, 6555/tcp hedera-guardian-guardian-service-1
aca8d17e50ed hedera-guardian-policy-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 0/tcp, 0.0.0.0:5006->5006/tcp hedera-guardian-policy-service-1
6e46385684f2 hedera-guardian-worker-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 6555/tcp hedera-guardian-worker-service-2
0672d5e935ed hedera-guardian-worker-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 6555/tcp hedera-guardian-worker-service-1
908ecddee2a0 hedera-guardian-auth-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 0.0.0.0:5005->5005/tcp, 6555/tcp hedera-guardian-auth-service-1
d3d411240b71 hedera-guardian-notification-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes hedera-guardian-notification-service-1
a481402a9b6f hedera-guardian-logger-service "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 6555/tcp hedera-guardian-logger-service-1
0d307384c4ac mongo-express:1.0.2-20 "/sbin/tini -- /dock…" 13 minutes ago Up 13 minutes 8081/tcp hedera-guardian-mongo-express-1
b8d87a862b6c ipfs/kubo:v0.26.0 "/sbin/tini -- /usr/…" 13 minutes ago Up 13 minutes (healthy) 0.0.0.0:4001->4001/tcp, 0.0.0.0:5001->5001/tcp, 4001/udp, 0.0.0.0:8080->8080/tcp, 8081/tcp hedera-guardian-ipfs-node-1
59496daff47c nats:2.9.24 "/nats-server --http…" 13 minutes ago Up 13 minutes 4222/tcp, 6222/tcp, 0.0.0.0:8222->8222/tcp hedera-guardian-message-broker-1
3ab891ae2180 prom/prometheus:v2.44.0 "/bin/prometheus --c…" 13 minutes ago Up 13 minutes 0.0.0.0:9090->9090/tcp hedera-guardian-prometheus-1
cf7285271fa1 mongo:6.0.13 "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 27017/tcp hedera-guardian-mongo-1
52f4e7100776 hedera-guardian-topic-viewer "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 3006/tcp, 0.0.0.0:5009->5009/tcp hedera-guardian-topic-viewer-1
b9fd0330a2b1 hashicorp/vault:1.12.11 "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 0.0.0.0:8200->8200/tcp hedera-guardian-vault-1
66213f53878f grafana/grafana:10.0.10 "/run.sh" 13 minutes ago Up 13 minutes 3000/tcp, 0.0.0.0:9080->9080/tcp hedera-guardian-grafana-1
def27ca4aef0 hedera-guardian-mrv-sender "docker-entrypoint.s…" 13 minutes ago Up 13 minutes 3003/tcp, 0.0.0.0:5008->5008/tcp hedera-guardian-mrv-sender-1
And as a point of confirmation, the API stopped working.
Expected behavior
Just for the Guardian to run and work accordingly.
The question would be, does any of the authentication methods change state or have side effects?
Screenshots
I tried using the system this morning, just logging in and while I could see the login screen, I received this error notification on the bottom right of the screen.
In addition, the entire API seems like it's down. The question would be, is this an instance of the database becoming corrupt?
I am running on macOS 13.4.1, but I have managed to successfully build and run the software initially with test API calls to some basic authentication endpoints.
Using a "develop" env file, in conjunction with docker compose up -d --build
And, before this upgrade, I ran the troubleshoot directions:
docker builder prune --all
docker compose build --no-cache
Okay, so I continued digging. And I found out that my standard registry user was deleted.
Effectively, I have a test that checks whether a user cannot be registered -- so a standard registry user, we use the name 'dovuauthority'.
If you look at the screenshot, I was able to generate a duplicate user, so maybe data was lost.
This below image is my test that I run to check if Registration of a user of a role of registry.
So, I logged in with the new standard registry user, and it's freezing up on the initial registration screen for setting up your keys.
Has anyone else experienced this behaviour?
Okay, another update.
I believe I've resolved the issue (but not the apparent data loss that was suffered), I will continue on my R&D quest.
I am able to login as a user and run my tests again, it feels inconsistent, but if I hit this issue again, I will add more comments.
Okay, I believe I think I understand what is happening.
I've had complete DB data loss three times now in the last week on my local machine.
This tends to happen when a task is unable to complete as it has reached a state whereby it is unable to finish -- and almost has an "infinite loop" effect system-wide.
Perhaps the data is in an invalid state for logic to continue?
What I have observed, is that while there are timeout functions for tasks, it almost attempts to create new tasks on top (or loops), and then blocks up the entire task queue and underlying event loop.
This has the downstream effect that folks are unable to view data through the Guardian frontend as the ability to fetch data from the API has been completely blocked.
This could be related to https://github.com/hashgraph/guardian/issues/3520 (only time will tell)