elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Add security index metadata -> metadata_flattened field migration

Open jfreden opened this issue 11 months ago • 0 comments

This PR adds code to migrate the metadata field to the metadata_flattened field in the main security index. metadata_flattened is of type flattened and already exists to enable queries against metadata for api keys. After this PR has been merged metadata will be indexed in the metadata_flattened field for privilege, user, apikey (already available), role-mappings and roles.

To migrate the field the PersistentTasksExecutor is used, which is responsible for executing restartable tasks that can survive disappearance of a coordinating and executor nodes. It also gives us guarantees that only a single instance of a task with the same id will run concurrently.

The migration is triggered by a security index state change and therefore the cluster state will be checked for if the migration has happened or not on every security index state change. If a new security index is created, the migration is skipped and cluster state will be populated with the migration status as completed.

The migration status is stored in IndexMetadata in cluster state as a custom metadata.

After this has been merged, the APIs that write to privilege, user, role-mappings and roles have been updated to dual-write the metadata field to metadata_flattened. This means that we also have to check that the new metadata flattened node feature exists so the new field is not written to a mixed cluster, causing old nodes to crash.

The PR also adds:

  • A new node feature: security.metadata_migrated to prevent the migration from running in mixed clusters.
  • A new transport version ADD_METADATA_FLATTENED_TO_ROLES to control the serialization of role descriptors.
  • A new field in the query user api allow queries against metadata for verification purposes.

Notes

  • I investigated if the status of the migration job (completed or not) could be kept in the _meta field in the security index mapping. After some testing, I don't think that is a pattern we want to use, primarily because of how the _meta field is updated. It's a full replace of the object field, which means that first the mappings has to be read followed by an update/merge with the new properties and then written. This whole process is not atomic, so it's not a good place to track the status of a job due to risk of race conditions. It's also not a great place to keep metadata because of how the mappings are currently handled by the SystemIndexMappingUpdateService, where the mapping is overwritten (including _meta) with the full mapping from the SystemIndexDescriptor. If the long term goal is to move over to only using the SystemIndexMappingUpdateService, it's better to not try to hack around its implementation in my opinion.
  • Logic could be added to the query metadata apis (query user, query roles) to check if the migration has happened and throw an error if the field is used in a query to prevent confusion for customers running mixed clusters or where the migration failed for some reason. I have not added this to the query users API.

Future Work

  • When doing the next major upgrade this approach would allow us to drop the metadata field in favour of the metadata_flattened field.
  • Add query roles API that uses the new field
  • Remove usage of metadata entirely.

TODO

  • Add BWC test that checks the security index directly for models without query api
  • Add unit test for new executor class
  • Update user docs to include metadata
  • Add and describe the info logs added to track the status of this job.
  • Mark metadata as remove in 9.x

jfreden avatar Mar 21 '24 10:03 jfreden