graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Add unique index for users.username

Open danotorrey opened this issue 2 years ago • 5 comments

What?

Add a unique index for the users.username field.

Why?

At least two instances have occurred when customers were upgrading to Graylog 4.3.2 where duplicate users (with the same username) were found and resulted in failed migrations.

For example for the graylog-sidecar user:

2022-06-03T10:22:18.961-05:00 ERROR [UserServiceImpl] There was more than one matching user for username graylog-sidecar. This should never happen.
2022-06-03T10:22:18.961-05:00 WARN  [ServerBootstrap] Exception while running migrations
java.lang.RuntimeException: There was more than one matching user for username graylog-sidecar. This should never happen.
        at org.graylog2.users.UserServiceImpl.load(UserServiceImpl.java:175) ~[graylog.jar:?]
        at org.graylog2.migrations.MigrationHelpers.ensureUser(MigrationHelpers.java:98) ~[graylog.jar:?]
        at org.graylog.plugins.sidecar.migrations.V20180323150000_AddSidecarUser.upgrade(V20180323150000_AddSidecarUser.java:60) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.lambda$runMigrations$0(ServerBootstrap.java:263) ~[graylog.jar:?]
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[graylog.jar:?]
        at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:8

This also happened for another user during a 4.2.9-4.3.2 upgrade for the Forwarder system user, which is added in this migration:

https://github.com/Graylog2/graylog-plugin-enterprise/blob/5ea3908413513f3a344d936a0d86780572b7776e/enterprise/src/main/java/org/graylog/plugins/forwarder/migrations/V20200918000000_AddForwarderUser.java#L40-L47

Your Environment

  • Graylog Version: 4.3.2
  • Elasticsearch Version:
  • MongoDB Version: 4.2.20
  • Operating System:
  • Browser version:

danotorrey avatar Jun 28 '22 19:06 danotorrey

@bernd @kroepke @mpfz0r @thll Do you have any idea how the duplicate users could be coming about? The migrations use the following method to ensure the user exists:

https://github.com/Graylog2/graylog2-server/blob/642342e2fd2b5e27db76b5ebd09db624ab95c1b9/graylog2-server/src/main/java/org/graylog2/migrations/MigrationHelpers.java#L88

As far as I know, in the two instances this has happened, it was for smaller version upgrades, and these system users should have already existed.

In the absence of understanding how the duplicate users are being created, perhaps we can add a migration to add a unique index to the username field to at least prevent the duplicate user from being created. Thoughts?

danotorrey avatar Jun 28 '22 20:06 danotorrey

Do you have any idea how the duplicate users could be coming about?

@danotorrey Do we have more logs? The stack trace above happens when the duplicate user already exists. It would be interesting to see the logs from the startup when the duplicate user got created.

But I also don't see how the code would allow the duplicate creation of the user. Perhaps it happens in some kind of race condition if there are two leader nodes in the cluster or when there are issues with the MongoDB connection.

We could improve the code to do an upsert instead of a fetch and update to avoid potential race conditions.

In the absence of understanding how the duplicate users are being created, perhaps we can add a migration to add a unique index to the username field to at least prevent the duplicate user from being created. Thoughts?

That sounds like a good idea. I am unsure if we tried that at some point in the past and failed. The problem is that the index creation fails if the collection already contains duplicate entries. That said, I can't think of an issue enabling the unique index on the username right now. The unique index is case-sensitive by default, which we need. The user service throws an error when loading a user that is duplicated. So it should be safe. Can you think of anything @mpfz0r?

bernd avatar Jun 29 '22 11:06 bernd

@bernd I am checking if the previous logs are available from the two recent reported instances. I also initially thought that the most likely cause was a race condition. Probably two parallel executions of the MigrationHelpers.ensureUser() method. Both get past the userService.load(userName) on line 98, and then both create the user.

But, the only part that does not make sense to me is, both instances of the error were 4.2.9 -> 4.3.x upgrade scenarios. Those users should have existed already (both the forwarder user and the sidecar user). How would the userService.load(userName) not return the user in that case? It does not fit with the race condition theory.

https://github.com/Graylog2/graylog2-server/blob/354afa04f96b250d583e41f7e4af7121dfb2e00b/graylog2-server/src/main/java/org/graylog2/migrations/MigrationHelpers.java#L98-L125

I think we should keep digging a bit more to find at least a possible scenario that could cause the duplicates. If it turns out to be a race condition, then I am leaning more towards making the ensureUser method more robust (e.g. use upsert instead) than the unique index. Making that method more robust would more specifically target the cause of the issue, vs. just protecting against the symptom (the unique index).

danotorrey avatar Jun 30 '22 14:06 danotorrey

But, the only part that does not make sense to me is, both instances of the error were 4.2.9 -> 4.3.x upgrade scenarios. Those users should have existed already (both the forwarder user and the sidecar user). How would the userService.load(userName) not return the user in that case? It does not fit with the race condition theory.

@danotorrey Good point! :+1: I didn't think of that.

I think we should keep digging a bit more to find at least a possible scenario that could cause the duplicates.

Sounds good, thank you! :pray:

bernd avatar Jun 30 '22 14:06 bernd

Hello,

just my 2 cents as we stumbled on errors (different than the ones above) while upgrading to 4.3, too.

In my opinition the duplication of the users was in prior upgrade and in 4.3 there is a change which stops migrations on errors.

In our instance for example this migration exited on error, but was added way before 4.3:

2022-06-27 21:17:01,437 WARN : org.graylog2.bootstrap.ServerBootstrap - Exception while running migrations
org.graylog.shaded.elasticsearch7.org.elasticsearch.ElasticsearchException: An error occurred:
        at org.graylog.storage.elasticsearch7.ElasticsearchClient.exceptionFrom(ElasticsearchClient.java:151) ~[?:?]
        at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:111) ~[?:?]
        at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:104) ~[?:?]
        at org.graylog.storage.elasticsearch7.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.addGl2MessageIdFieldAlias(V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.java:53) ~[?:?]
        at org.graylog.plugins.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEvents.upgrade(V20200730000000_AddGl2MessageIdFieldAliasForEvents.java:76) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.lambda$runMigrations$0(ServerBootstrap.java:263) ~[graylog.jar:?]
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[graylog.jar:?]
        at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:88) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.runMigrations(ServerBootstrap.java:261) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.startCommand(ServerBootstrap.java:187) [graylog.jar:?]
        at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:311) [graylog.jar:?]
        at org.graylog2.bootstrap.Main.main(Main.java:45) [graylog.jar:?]
Caused by: org.graylog.shaded.elasticsearch7.org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=mapper_parsing_exception, reason=Invalid [path] value [id] for field alias [gl2_message_id]: an alias must refer to an existing field in the mappings.]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1598) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1565) ~[?:?]
        at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.IndicesClient.putMapping(IndicesClient.java:353) ~[?:?]
        at org.graylog.storage.elasticsearch7.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.lambda$addGl2MessageIdFieldAlias$2(V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.java:53) ~[?:?]
        at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:109) ~[?:?]
        ... 10 more
        Suppressed: org.graylog.shaded.elasticsearch7.org.elasticsearch.client.ResponseException: method [PUT], host [http://9200], URI [/gl-events*,gl-system-events*/_mapping?master_timeout=30s&ignore_unavailable=true&expand_wildcards=open%2Cclosed&allow_no_indices=true&ignore_throttled=false&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Invalid [path] value [id] for field alias [gl2_message_id]: an alias must refer to an existing field in the mappings."}],"type":"mapper_parsing_exception","reason":"Invalid [path] value [id] for field alias [gl2_message_id]: an alias must refer to an existing field in the mappings."},"status":400}
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.performRequest(RestClient.java:272) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.performRequest(RestClient.java:246) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1598) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1565) ~[?:?]
                at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.IndicesClient.putMapping(IndicesClient.java:353) ~[?:?]
                at org.graylog.storage.elasticsearch7.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.lambda$addGl2MessageIdFieldAlias$2(V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.java:53) ~[?:?]
                at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:109) ~[?:?]
                at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:104) ~[?:?]
                at org.graylog.storage.elasticsearch7.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.addGl2MessageIdFieldAlias(V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.java:53) ~[?:?]
                at org.graylog.plugins.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEvents.upgrade(V20200730000000_AddGl2MessageIdFieldAliasForEvents.java:76) ~[graylog.jar:?]
                at org.graylog2.bootstrap.ServerBootstrap.lambda$runMigrations$0(ServerBootstrap.java:263) ~[graylog.jar:?]
                at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[graylog.jar:?]
                at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:88) ~[graylog.jar:?]
                at org.graylog2.bootstrap.ServerBootstrap.runMigrations(ServerBootstrap.java:261) ~[graylog.jar:?]
                at org.graylog2.bootstrap.ServerBootstrap.startCommand(ServerBootstrap.java:187) [graylog.jar:?]
                at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:311) [graylog.jar:?]
                at org.graylog2.bootstrap.Main.main(Main.java:45) [graylog.jar:?]

As seen in git history the file and so the migration is very old, but did not error out on startup prior to 4.3: https://github.com/Graylog2/graylog2-server/commits/master/graylog-storage-elasticsearch7/src/main/java/org/graylog/storage/elasticsearch7/views/migrations/V20200730000000_AddGl2MessageIdFieldAliasForEventsES7.java

The migration behaviour was changed with this PR to exit(1) on error: https://github.com/Graylog2/graylog2-server/pull/11526

Previously failed migrations had no impact on running graylog-server

HenryTheSir avatar Jul 01 '22 08:07 HenryTheSir

EDIT: Discovered the ignore_migration_failures config value, which let us proceed with our upgrade.

Has there been any movement on this? We're facing a similar issue while upgrading 4.2.x -> 4.3.x. Our master node is failing to start due to duplicate user error:

java.lang.RuntimeException: There was more than one matching user for username graylog-sidecar. This should never happen.
        at org.graylog2.users.UserServiceImpl.load(UserServiceImpl.java:175) ~[graylog.jar:?]
        at org.graylog2.migrations.MigrationHelpers.ensureUser(MigrationHelpers.java:98) ~[graylog.jar:?]
        at org.graylog.plugins.sidecar.migrations.V20180323150000_AddSidecarUser.upgrade(V20180323150000_AddSidecarUser.java:60) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.lambda$runMigrations$0(ServerBootstrap.java:264) ~[graylog.jar:?]
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[graylog.jar:?]
        at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:88) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.runMigrations(ServerBootstrap.java:261) ~[graylog.jar:?]
        at org.graylog2.bootstrap.ServerBootstrap.startCommand(ServerBootstrap.java:187) [graylog.jar:?]
        at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:312) [graylog.jar:?]
        at org.graylog2.bootstrap.Main.main(Main.java:45) [graylog.jar:?]

brosef avatar Jan 30 '23 18:01 brosef

I think we should keep digging a bit more to find at least a possible scenario that could cause the duplicates. If it turns out to be a race condition, then I am leaning more towards making the ensureUser method more robust (e.g. use upsert instead) than the unique index. Making that method more robust would more specifically target the cause of the issue, vs. just protecting against the symptom (the unique index).

+1

boosty avatar Feb 02 '23 08:02 boosty

When digging for possible causes of the bug, we should focus on changes that happened between 4.2.x -> 4.3.x.

Also, while both the upsert and making the index unique makes sense, we should not forget that we have to resolve the duplicate users first. We should try and keep the user that has a token assigned, because that's likely the one that is in use. If both have tokens, we can only rename one of them.

mpfz0r avatar Feb 02 '23 08:02 mpfz0r

What I have gleaned from the code:

  • UserService.save already performs an upsert via PersistedServiceImpl.save. No need to change anything there.
  • Access tokens are mapped to users via username. If there are duplicate users, then that mapping is also ambiguous. I.e. it doesn't matter which of the duplicates is renamed.

patrickmann avatar Feb 10 '23 15:02 patrickmann