OpenMetadata
OpenMetadata copied to clipboard
Access denied error during ingestion causing underlying objects deleted
Affected module Ingestion
Describe the bug When the ingestion-bot receives an Access Denied error during ingestion, it deletes all objects belonging to that service. For example: during the ingestion of tables from AWS Athena, unavailability (resulting from IAM policy) causes the deletion of all tables from this service.
To Reproduce
- Ingest some data from AWS Athena
- Block access for the user used for ingestion
- Run ingestion workflow again (objects are soft deleted)
- Restore access fro ingestion user
- Run ingestion workflow (tables are restored).
Expected behavior Access denied error doesn't cause any object deleted. Just do nothing (only reporting exception).
Version:
- OS: Linux
- Python version:
- OpenMetadata version: 1.2.4
Additional context
[2024-04-16, 10:27:25 UTC] {common_db_source.py:331} WARNING - Fetching tables names failed for schema ******** due to - An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/**********/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:********:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: 497ae083-db2f-4b63-af08-******; Proxy: null)
Stack trace:
2024-04-16, 10:27:51 UTC] {common_db_source.py:331} WARNING - Fetching tables names failed for schema ***** due to - An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::******:assumed-role/*******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:*****:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-*****; Proxy: null) [2024-04-16, 10:27:51 UTC] {common_db_source.py:334} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 324, in _list_table_metadata response = retry_api_call( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/util.py", line 85, in retry_api_call return retry(func, *args, **kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 964, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.MetadataException: An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/*******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:**********:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-******; Proxy: null) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/database/common_db_source.py", line 278, in get_tables_name_and_type for table_and_type in self.query_table_names_and_types(schema_name): File "/home/airflow/.local/lib/python3.9/site-packages/metadata/ingestion/source/database/athena/metadata.py", line 186, in query_table_names_and_types for name in self.inspector.get_table_names(schema_name) File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 266, in get_table_names return self.dialect.get_table_names( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/sqlalchemy/base.py", line 1065, in get_table_names tables = self._get_tables(connection, schema, **kw) File "<string>", line 2, in _get_tables File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache ret = fn(self, con, *args, **kw) File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/sqlalchemy/base.py", line 1052, in _get_tables return cursor.list_table_metadata(schema_name=schema) File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 349, in list_table_metadata next_token, response = self._list_table_metadata( File "/home/airflow/.local/lib/python3.9/site-packages/pyathena/common.py", line 332, in _list_table_metadata raise OperationalError(*e.args) from e pyathena.error.OperationalError: An error occurred (MetadataException) when calling the ListTableMetadata operation: User: arn:aws:sts::*******:assumed-role/******/botocore-session-1713263123 is not authorized to perform: glue:GetTables on resource: arn:aws:glue:eu-central-1:*******:catalog with an explicit deny in a service control policy (Service: AmazonDataCatalog; Status Code: 400; Error Code: AccessDeniedException; Request ID: a82d6c65-fe89-47de-9bac-******; Proxy: null)