OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Access denied error during ingestion causing underlying objects deleted

Open Iture opened this issue 10 months ago • 0 comments

Affected module Ingestion

Describe the bug When the ingestion-bot receives an Access Denied error during ingestion, it deletes all objects belonging to that service. For example: during the ingestion of tables from AWS Athena, unavailability (resulting from IAM policy) causes the deletion of all tables from this service.

To Reproduce

  1. Ingest some data from AWS Athena
  2. Block access for the user used for ingestion
  3. Run ingestion workflow again (objects are soft deleted)
  4. Restore access fro ingestion user
  5. Run ingestion workflow (tables are restored).

Expected behavior Access denied error doesn't cause any object deleted. Just do nothing (only reporting exception).

Version:

  • OS: Linux
  • Python version:
  • OpenMetadata version: 1.2.4

Additional context [2024-04-16, 10:27:25 UTC] {common_db_source.py:331} WARNING - Fetching tables names failed for schema ******** due to - An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/**********/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:********:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: 497ae083-db2f-4b63-af08-******; Proxy: null)

Stack trace: 2024-04-16, 10:27:51 UTC] {common_db_source.py:331} WARNING - Fetching tables names failed for schema ***** due to - An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::******:assumed-role/*******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:*****:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-*****; Proxy: null) [2024-04-16, 10:27:51 UTC] {common_db_source.py:334} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 324, in _list_table_metadata response = retry_api_call( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/util.py", line 85, in retry_api_call return retry(func, *args, **kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 964, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.MetadataException: An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/*******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:**********:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-******; Proxy: null) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/database/common_db_source.py", line 278, in get_tables_name_and_type for table_and_type in self.query_table_names_and_types(schema_name): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/database/athena/metadata.py", line 186, in query_table_names_and_types for name in self.inspector.get_table_names(schema_name) File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 266, in get_table_names return self.dialect.get_table_names( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/sqlalchemy/base.py", line 1065, in get_table_names tables = self._get_tables(connection, schema, **kw) File "<string>", line 2, in _get_tables File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache ret = fn(self, con, *args, **kw) File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/sqlalchemy/base.py", line 1052, in _get_tables return cursor.list_table_metadata(schema_name=schema) File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 349, in list_table_metadata next_token, response = self._list_table_metadata( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 332, in _list_table_metadata raise OperationalError(*e.args) from e pyathena.error.OperationalError: An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:*******:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-******; Proxy: null)

Iture avatar Apr 16 '24 20:04 Iture