teleport icon indicating copy to clipboard operation
teleport copied to clipboard

Misleading error message in Teleport agent database logs

Open programmerq opened this issue 1 year ago • 3 comments

Expected behavior:

I am not sure if the current behavior is expected-- if one creates a dynamic database resource, it will prevent loading of cloud databases that share the same name. If that is indeed the expected behavior, then logging should be improved. This was written with logging in mind, but changing the underlying behavior might also be helpful.

Teleport should provide clearer and more informative debugging and logging messages when handling cloud and dynamic databases. This includes specifying what the agent is attempting to match against when looping over all database resources and throwing a clearer error when there is a conflict between a dynamic and cloud database of the same name.

Current behavior:

When there is a conflict between a dynamic and cloud database of the same name, the dynamic resource "wins" and prevents agents from trying to load a cloud database of the same name.

Additionally, the debug error messages don't indicate the true cause for the database not being loaded. This leads to confusion and really made troubleshooting difficult. The message "database does not match" does not clearly explain the issue of a conflict between a dynamic and cloud database.

This appears to be the line that skips over a resource with the same name but that isn't a cloud type resource, which effectively makes an existing dynamic object take precedence and prevent the loading of a matching cloud database:

https://github.com/gravitational/teleport/blob/v13.3.6/lib/srv/db/watcher.go#L212-L216

Bug details:

  • Version: observed on several 13.2.x and 13.3.x versions.

The issue that prompted this logging change request was a cloud database that wasn't showing up in the web ui, nor in tsh db ls. Confusingly, it was the only database that showed up in tctl get db.

The logging message in the agent implied that there was an issue with the aws database itself-- that perhaps it had an incorrect tag:

2023-08-23T00:48:02Z DEBU [DB:SERVIC] db dbname doesn't match, not creating. services/reconciler.go:154

It turned out that there was a separately created dynamic db object that had the same dbname name. The debug level logs should be improved to clarify what is being matched and why something does or does not match.

Additionally, it was not clear why the same agent also listed several other database names that were in other AWS accounts, which had the same message.

For the non-affected databases in the cloud account this agent was set up to serve, the following log message appeared:

2023-08-23T00:48:02Z DEBU [DB:SERVIC] db dbprod is already registered. services/reconciler.go:182

A more robust error message phrase would better differentiate what match is being done and why a database is or is not loaded.


Given a cluster that has two agents, each that are expected to load four database objects, but there is an existing dynamic database object:

  • agent a:
    • a-dev
    • a-qa
    • a-stage
    • a-prod
  • agent b:
    • b-dev
    • b-qa
    • b-stage
    • b-prod

existing dynamic db object b-prod

The current log output for agent b:

DEBU [DB:SERVIC] Reconciling 3 current resources with 8 new resources. services/reconciler.go:102
DEBU [DB:SERVIC] db a-dev doesn't match, not creating. services/reconciler.go:154
DEBU [DB:SERVIC] db b-stage is already registered. services/reconciler.go:182
DEBU [DB:SERVIC] db b-dev is already registered. services/reconciler.go:182
DEBU [DB:SERVIC] db a-stage doesn't match, not creating. services/reconciler.go:154
DEBU [DB:SERVIC] db a-qa doesn't match, not creating. services/reconciler.go:154
DEBU [DB:SERVIC] db a-prod doesn't match, not creating. services/reconciler.go:154
DEBU [DB:SERVIC] db b-prod doesn't match, not creating. services/reconciler.go:154
DEBU [DB:SERVIC] db b-qa is already registered. services/reconciler.go:182

There isn't a significant differentiation between the a-prod and b-prod lines. The fact that an existing b-prod dynamic db resource is getting in the way.

Here's an alternate log message that might be more helpful:

DEBU [DB:SERVIC] Reconciling 3 current resources with 8 new resources. services/reconciler.go:102
DEBU [DB:SERVIC] teleport cloud db a-dev is managed outside this agent. skipping. services/reconciler.go:154
DEBU [DB:SERVIC] teleport cloud db b-stage is already registered. services/reconciler.go:182
DEBU [DB:SERVIC] teleport cloud db b-dev is already registered. services/reconciler.go:182
DEBU [DB:SERVIC] teleport cloud db a-stage is managed outside this agent. skipping. services/reconciler.go:154
DEBU [DB:SERVIC] teleport cloud db a-qa is managed outside this agent. skipping. services/reconciler.go:154
DEBU [DB:SERVIC] teleport cloud db a-prod is managed outside this agent. skipping. services/reconciler.go:154
DEBU [DB:SERVIC] teleport dynamic db b-prod is managed outside this agent. skipping. services/reconciler.go:154
DEBU [DB:SERVIC] teleport cloud db b-qa is already registered. services/reconciler.go:182

programmerq avatar Aug 28 '23 16:08 programmerq