cloud-pipeline
cloud-pipeline copied to clipboard
[GCP] Data Catalog improvements
Background At the moment we observe some issues with data catalog including:
- low performance for large indices
- instability of security logs
- duplication of indices
Approach The following solutions shall be implemented:
- instability of security logs is addressed via https://github.com/epam/cloud-pipeline/issues/3947, as audit logs are now stored and managed using GCP APIs
- low performance for large indices:
- allow to exclude files from indexing
- support multi node deployment for Elasticsearch on GKE to improve service performance
- duplication of indices: introduce additional clean up logic during index management