dbt-spark
dbt-spark copied to clipboard
[ADAP-1085] [Bug] When using iceberg format, dbt docs generate is unable to populate the columns information
Is this a new bug in dbt-spark?
- [X] I believe this is a new bug in dbt-spark
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When using iceberg table format, dbt docs generate creates empty catalog.json and hence provides no column information in documentation
Expected Behavior
DBT docs generate should generate properly populated catalog.json
Steps To Reproduce
- Configure EMR for working with iceberg and glue catalog
- Setup thrift server
- Run dbt project on EMR using thrift server
- Run dbt docs generate
Relevant log output
0m02:50:24.631713 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c572b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c57a60>]}
============================== 02:50:24.638185 | c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35 ==============================
[0m02:50:24.638185 [info ] [MainThread]: Running with dbt=1.7.4
[0m02:50:24.639539 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'fail_fast': 'False', 'log_path': '/home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/logs', 'version_check': 'True', 'profiles_dir': '/home/ec2-user/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'invocation_command': 'dbt docs generate --vars {"day": "31","hour": "0","month": "12","raw_bucket":"c360-raw-data-*****-us-east-1","ts": "2023-12-31T00:00:00+00:00","year": "2023"}', 'introspect': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
[0m02:50:24.958174 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617bece50>]}
[0m02:50:25.203736 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:25.204633 [info ] [MainThread]: Registered adapter: spark=1.7.0
[0m02:50:25.223511 [debug] [MainThread]: checksum: 577537e0073da8fb99e9f3abffc643b153c4ab719d0d0e1e2dce7637653d4e74, vars: {'day': '31',
'hour': '0',
'month': '12',
'raw_bucket': 'c360-raw-data-********-us-east-1',
'ts': '2023-12-31T00:00:00+00:00',
'year': '2023'}, profile: , target: , version: 1.7.4
[0m02:50:25.260540 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
[0m02:50:25.261176 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
[0m02:50:25.269396 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6175a6f10>]}
[0m02:50:25.272216 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2b0>]}
[0m02:50:25.272952 [info ] [MainThread]: Found 7 models, 6 sources, 0 exposures, 0 metrics, 439 macros, 0 groups, 0 semantic models
[0m02:50:25.273827 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2e0>]}
[0m02:50:25.276742 [info ] [MainThread]:
[0m02:50:25.278117 [debug] [MainThread]: Acquiring new spark connection 'master'
[0m02:50:25.280561 [debug] [ThreadPool]: Acquiring new spark connection 'list_None_c360bronze'
[0m02:50:25.295222 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:25.295972 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.296507 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
[0m02:50:25.296985 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:25.447602 [debug] [ThreadPool]: Spark adapter: Poll response: TGetOperationStatusResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), operationState=5, sqlState=None, errorCode=0, errorMessage='org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;\nShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]\n+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]\n\n\tat org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)\n\tat
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)\n\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)\n\t... 16 more\n', taskStatus=None, operationStarted=None, operationCompleted=None, hasResultSet=None, progressUpdateResponse=None)
[0m02:50:25.448538 [debug] [ThreadPool]: Spark adapter: Poll status: 5
[0m02:50:25.449121 [debug] [ThreadPool]: Spark adapter: Error while running:
/* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
[0m02:50:25.449863 [debug] [ThreadPool]: Spark adapter: Database Error
org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:261)
Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
... 16 more
[0m02:50:25.450777 [debug] [ThreadPool]: Spark adapter: Error while running:
macro list_relations_without_caching
[0m02:50:25.451525 [debug] [ThreadPool]: Spark adapter: Runtime Error
Database Error
org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:338)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)
... 16 more
[0m02:50:25.457505 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.458084 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show tables in c360bronze like '*'
[0m02:50:25.697421 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.698186 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.708726 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.709422 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream
[0m02:50:25.895379 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.896238 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.905061 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.905728 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream2
[0m02:50:26.128223 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.128935 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.139356 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.140353 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__cart_items
[0m02:50:26.370865 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.371741 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.381025 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.381722 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__customer
[0m02:50:26.572163 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.573853 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.584021 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.584680 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__order_items
[0m02:50:26.783624 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.784357 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.795421 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.796071 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product
[0m02:50:26.987528 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.988230 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.996066 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.996669 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product_rating
[0m02:50:27.228290 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.229005 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.237495 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:27.238161 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_supportdb__support_chat
[0m02:50:27.439212 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.439901 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.445604 [debug] [ThreadPool]: On list_None_c360bronze: ROLLBACK
[0m02:50:27.446268 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:27.446799 [debug] [ThreadPool]: On list_None_c360bronze: Close
[0m02:50:27.570900 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617832d60>]}
[0m02:50:27.572181 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
[0m02:50:27.573155 [info ] [MainThread]:
[0m02:50:27.576111 [debug] [Thread-1 ]: Began running node model.c360.stg_clickstream
[0m02:50:27.578024 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly list_None_c360bronze, now model.c360.stg_clickstream)
[0m02:50:27.578766 [debug] [Thread-1 ]: Began compiling node model.c360.stg_clickstream
[0m02:50:27.603421 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_clickstream"
[0m02:50:27.604594 [debug] [Thread-1 ]: Timing info for model.c360.stg_clickstream (compile): 02:50:27.579148 => 02:50:27.604210
[0m02:50:27.605277 [debug] [Thread-1 ]: Began executing node model.c360.stg_clickstream
[0m02:50:27.606030 [debug] [Thread-1 ]: Timing info for model.c360.stg_clickstream (execute): 02:50:27.605629 => 02:50:27.605653
[0m02:50:27.607610 [debug] [Thread-1 ]: Finished running node model.c360.stg_clickstream
[0m02:50:27.608426 [debug] [Thread-1 ]: Began running node model.c360.stg_salesdb__cart_items
[0m02:50:27.609975 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_clickstream, now model.c360.stg_salesdb__cart_items)
[0m02:50:27.610700 [debug] [Thread-1 ]: Began compiling node model.c360.stg_salesdb__cart_items
[0m02:50:27.619178 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_salesdb__cart_items"
[0m02:50:27.620232 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__cart_items (compile): 02:50:27.611076 => 02:50:27.619876
[0m02:50:27.620860 [debug] [Thread-1 ]: Began executing node model.c360.stg_salesdb__cart_items
[0m02:50:27.621648 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__cart_items (execute): 02:50:27.621216 => 02:50:27.621229
[0m02:50:27.624942 [debug] [Thread-1 ]: Finished running node model.c360.stg_salesdb__cart_items
[0m02:50:27.625894 [debug] [Thread-1 ]: Began running node model.c360.stg_salesdb__customer
[0m02:50:27.627310 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__cart_items, now model.c360.stg_salesdb__customer)
[0m02:50:27.628163 [debug] [Thread-1 ]: Began compiling node model.c360.stg_salesdb__customer
[0m02:50:27.635786 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_salesdb__customer"
[0m02:50:27.636779 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__customer (compile): 02:50:27.628650 => 02:50:27.636440
[0m02:50:27.637526 [debug] [Thread-1 ]: Began executing node model.c360.stg_salesdb__customer
[0m02:50:27.638335 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__customer (execute): 02:50:27.637949 => 02:50:27.637961
[0m02:50:27.639622 [debug] [Thread-1 ]: Finished running node model.c360.stg_salesdb__customer
[0m02:50:27.640276 [debug] [Thread-1 ]: Began running node model.c360.stg_salesdb__order_items
[0m02:50:27.641442 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__customer, now model.c360.stg_salesdb__order_items)
[0m02:50:27.642196 [debug] [Thread-1 ]: Began compiling node model.c360.stg_salesdb__order_items
[0m02:50:27.650906 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_salesdb__order_items"
[0m02:50:27.652276 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__order_items (compile): 02:50:27.642683 => 02:50:27.651808
[0m02:50:27.653043 [debug] [Thread-1 ]: Began executing node model.c360.stg_salesdb__order_items
[0m02:50:27.653692 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__order_items (execute): 02:50:27.653397 => 02:50:27.653410
[0m02:50:27.655082 [debug] [Thread-1 ]: Finished running node model.c360.stg_salesdb__order_items
[0m02:50:27.655742 [debug] [Thread-1 ]: Began running node model.c360.stg_salesdb__product
[0m02:50:27.656697 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__order_items, now model.c360.stg_salesdb__product)
[0m02:50:27.657630 [debug] [Thread-1 ]: Began compiling node model.c360.stg_salesdb__product
[0m02:50:27.744326 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_salesdb__product"
[0m02:50:27.745610 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__product (compile): 02:50:27.658152 => 02:50:27.745075
[0m02:50:27.746830 [debug] [Thread-1 ]: Began executing node model.c360.stg_salesdb__product
[0m02:50:27.747641 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__product (execute): 02:50:27.747323 => 02:50:27.747337
[0m02:50:27.749546 [debug] [Thread-1 ]: Finished running node model.c360.stg_salesdb__product
[0m02:50:27.750300 [debug] [Thread-1 ]: Began running node model.c360.stg_salesdb__product_rating
[0m02:50:27.751857 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product, now model.c360.stg_salesdb__product_rating)
[0m02:50:27.752630 [debug] [Thread-1 ]: Began compiling node model.c360.stg_salesdb__product_rating
[0m02:50:27.760353 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_salesdb__product_rating"
[0m02:50:27.761355 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__product_rating (compile): 02:50:27.753148 => 02:50:27.761013
[0m02:50:27.762149 [debug] [Thread-1 ]: Began executing node model.c360.stg_salesdb__product_rating
[0m02:50:27.762897 [debug] [Thread-1 ]: Timing info for model.c360.stg_salesdb__product_rating (execute): 02:50:27.762503 => 02:50:27.762526
[0m02:50:27.764336 [debug] [Thread-1 ]: Finished running node model.c360.stg_salesdb__product_rating
[0m02:50:27.765582 [debug] [Thread-1 ]: Began running node model.c360.stg_supportdb__support_chat
[0m02:50:27.768105 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product_rating, now model.c360.stg_supportdb__support_chat)
[0m02:50:27.768876 [debug] [Thread-1 ]: Began compiling node model.c360.stg_supportdb__support_chat
[0m02:50:27.776378 [debug] [Thread-1 ]: Writing injected SQL for node "model.c360.stg_supportdb__support_chat"
[0m02:50:27.777509 [debug] [Thread-1 ]: Timing info for model.c360.stg_supportdb__support_chat (compile): 02:50:27.769323 => 02:50:27.777059
[0m02:50:27.778705 [debug] [Thread-1 ]: Began executing node model.c360.stg_supportdb__support_chat
[0m02:50:27.779701 [debug] [Thread-1 ]: Timing info for model.c360.stg_supportdb__support_chat (execute): 02:50:27.779171 => 02:50:27.779367
[0m02:50:27.781293 [debug] [Thread-1 ]: Finished running node model.c360.stg_supportdb__support_chat
[0m02:50:27.782634 [debug] [MainThread]: Connection 'master' was properly closed.
[0m02:50:27.783134 [debug] [MainThread]: Connection 'model.c360.stg_supportdb__support_chat' was properly closed.
[0m02:50:27.785129 [debug] [MainThread]: Command end result
[0m02:50:27.800011 [debug] [MainThread]: Acquiring new spark connection 'generate_catalog'
[0m02:50:27.800584 [info ] [MainThread]: Building catalog
[0m02:50:27.804828 [debug] [ThreadPool]: Acquiring new spark connection 'spark_catalog.c360raw'
[0m02:50:27.805619 [debug] [ThreadPool]: On "spark_catalog.c360raw": cache miss for schema ".spark_catalog.c360raw", this is inefficient
[0m02:50:27.811565 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:27.812110 [debug] [ThreadPool]: Using spark connection "spark_catalog.c360raw"
[0m02:50:27.812600 [debug] [ThreadPool]: On spark_catalog.c360raw: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "spark_catalog.c360raw"} */
show table extended in spark_catalog.c360raw like '*'
[0m02:50:27.813343 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:30.996262 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:30.997081 [debug] [ThreadPool]: SQL status: OK in 3.0 seconds
[0m02:50:31.017468 [debug] [ThreadPool]: While listing relations in database=, schema=spark_catalog.c360raw, found: cart_items, customer, order_items, product, product_rating, simulation, support_chat
[0m02:50:31.018414 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.cart_items
[0m02:50:31.019253 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.customer
[0m02:50:31.020171 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.order_items
[0m02:50:31.020980 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product
[0m02:50:31.021837 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product_rating
[0m02:50:31.022754 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.simulation
[0m02:50:31.023464 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.support_chat
[0m02:50:31.030727 [debug] [ThreadPool]: On spark_catalog.c360raw: ROLLBACK
[0m02:50:31.032244 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:31.034169 [debug] [ThreadPool]: On spark_catalog.c360raw: Close
[0m02:50:31.153712 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly spark_catalog.c360raw, now c360bronze)
[0m02:50:31.154930 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream
[0m02:50:31.156753 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream2
[0m02:50:31.159402 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__cart_items
[0m02:50:31.160427 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__customer
[0m02:50:31.161208 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__order_items
[0m02:50:31.161775 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product
[0m02:50:31.162315 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product_rating
[0m02:50:31.162877 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_supportdb__support_chat
[0m02:50:31.191842 [info ] [MainThread]: Catalog written to /home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/target/catalog.json
[0m02:50:31.195884 [debug] [MainThread]: Resource report: {"command_name": "generate", "command_success": true, "command_wall_clock_time": 6.6277905, "process_user_time": 3.394287, "process_kernel_time": 0.149442, "process_mem_max_rss": "104176", "process_out_blocks": "4960", "process_in_blocks": "0"}
[0m02:50:31.198437 [debug] [MainThread]: Command `dbt docs generate` succeeded at 02:50:31.198171 after 6.63 seconds
[0m02:50:31.199040 [debug] [MainThread]: Connection 'generate_catalog' was properly closed.
[0m02:50:31.199666 [debug] [MainThread]: Connection 'c360bronze' was properly closed.
[0m02:50:31.200187 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6179c4370>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:31.201812 [debug] [MainThread]: Flushing usage events
Environment
- OS:Amazon linux {"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
- Python: 3.10
- dbt-core: 1.7.4
- dbt-spark: 1.7.0
Additional Context
This is the catalog.json generated by dbt docs generate ...
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
emr_dag_automation_blueprint.py.txt Attached DAG can be used to create the EMR with the above configuration
Here is the requirements.txt
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"
apache-airflow==2.6.3
apache-airflow-providers-salesforce
apache-airflow-providers-apache-spark
apache-airflow-providers-amazon
apache-airflow-providers-postgres
apache-airflow-providers-mongo
apache-airflow-providers-ssh
apache-airflow-providers-common-sql
astronomer-cosmos
boto3
simplejson
pymongo
pymssql
smart-open
psycopg2==2.9.5
simple-salesforce