cloud-pipeline icon indicating copy to clipboard operation
cloud-pipeline copied to clipboard

[GCP] Data Catalog improvements

Open mzueva opened this issue 4 months ago • 0 comments

Background At the moment we observe some issues with data catalog including:

  • low performance for large indices
  • instability of security logs
  • duplication of indices

Approach The following solutions shall be implemented:

  • instability of security logs is addressed via https://github.com/epam/cloud-pipeline/issues/3947, as audit logs are now stored and managed using GCP APIs
  • low performance for large indices:
    • allow to exclude files from indexing
    • support multi node deployment for Elasticsearch on GKE to improve service performance
  • duplication of indices: introduce additional clean up logic during index management

mzueva avatar Jun 04 '25 12:06 mzueva