incubator-xtable
incubator-xtable copied to clipboard
AWS Glue with Iceberg Source and Delta Target
I seem to keep getting a version-hint.text
error when trying to use the .jar file to sync an existing iceberg table to a delta, using AWS Glue as my catalog. Im using the my_config.yaml
file as directed in the doc's, but i feel like im missing something.
For instance:
sourceFormat: ICEBERG
targetFormats:
- DELTA
datasets:
-
# (i have this set to the parent path that holds the (metadata, data and _delta_log directories), ive tried other configurations as well. The _delta_log directory gets created, but nothing gets created in it, as i keep getting the error indicated above)
tableBasePath: s3://path
tableName: my_table
Do i need to create a catalog.yaml and clients.yaml file as well? I noticed these but i thought they were only for implementing custom sources other than Hudi, Iceberg or Delta.
If i do need to create these files, what would i set as the catalogImpl:
to for a source iceberg table that is managed by aws glue?
catalogImpl: io.my.CatalogImpl
catalogName: name
catalogOptions: # all other options are passed through in a map
key1: value1
key2: value2
Thanks in advance!
Yes you will need a catalog yaml to specify which catalog implementation you are using for the Iceberg source. If none is specified, it assumes you're not using an external catalog. Once we get this working it would be good to add some feedback or submit a PR for updates to the README since this part is not so clear right now :)
Yes you will need a catalog yaml to specify which catalog implementation you are using for the Iceberg source. If none is specified, it assumes you're not using an external catalog. Once we get this working it would be good to add some feedback or submit a PR for updates to the README since this part is not so clear right now :)
Hi thank you! You will have to forgive me because I'm about to sound dumb, but I guess, what would I put here
'catalogImpl: io.my.CatalogImpl'
If I am using aws managed glue with iceberg? And what keys would I need to provide?
@the-other-tim-brown - I am also stuck on this particular issue, do you have an example of what this config would look like? Thanks.
@MrDerecho to catalog with glue while running sync, you can look at this example: https://medium.com/@sagarlakshmipathy/using-onetable-to-translate-a-hudi-table-to-iceberg-format-and-sync-with-glue-catalog-8c3071f08877
@MrDerecho to catalog with glue while running sync, you can look at this example: https://medium.com/@sagarlakshmipathy/using-onetable-to-translate-a-hudi-table-to-iceberg-format-and-sync-with-glue-catalog-8c3071f08877
@sagarlakshmipathy thanks for this info! If I'm running onetable in a docker container, do I need to use jdk 8 or 11? I noticed that the docs say 11, but the pom file that's builds the utilities jar uses 8. Maybe you can help demystify which jdk should be used for build vs. run?
@ForeverAngry we are currently using java 11 for development and building the jars but we set a lower target so that projects that are still on java 8 can use the jar. Your runtime environment just needs to be java 8 or above for a java 8 jar to work properly.
Maybe I'm doing something wrong here, but when I try to convert iceberg to delta using a glue catalog, I keep getting the error that the class constructor for catalog.Catalog is not found. I've tried adding the iceberg aws bundle to the class path, as well as many other jara, but still the same error. I'm executing the one table jar sync job from a docker image. Maybe the fact that it's a docker image is causing issues? I'm not sure. Does anyone have examples of specific versions of jars that should be used when converting from a iceberg source managed by glue to delta, that they know work? @the-other-tim-brown @sagarlakshmipathy any help would be very welcomed. I'd love to get this tool working!
@ForeverAngry what is the error message you are seeing and how are you creating this docker image?
@ForeverAngry what is the error message you are seeing and how are you creating this docker image?
@the-other-tim-brown Well, it was for sure an error with the image. But the error I'm getting now is related the partition. I have a column called timestamp' tbat ibuse a
partition transform based on hour` on and now I get the following error:
org.apache.spark.sql.AnalysisException: cannot resolve 'date_format(timestamp, 'yyyy-MM-dd-HH')' due to data type mismatch: argument 1 requires timestamp type, however, 'timestamp' is of bigint type.; line 1 pos 0;
Any thoughts on how to resolve this error?
@ForeverAngry when you say you have an "identity transformation" on the column, what do you mean by that? It sounds like you are using the hour
transform from the Iceberg partitioning docs.
The exception looks like there could be an issue in how the source schema is converted. When converting Iceberg's "hidden partitioning" to Delta Lake, we use a generated column to represent the partition values.
@the-other-tim-brown yeah I misspoke in my comment, you are correct, I'm using the hour transform.
Ok can you attach more of the stacktrace so I can see where exactly that error is thrown?
@the-other-tim-brown yeah here is the error:
2024-03-05 20:30:06 ERROR io.onetable.spi.sync.TableFormatSync:78 - Failed to sync snapshot org.apache.spark.sql.AnalysisException: cannot resolve 'date_format(timestamp, 'yyyy-MM-dd-HH')' due to data type mismatch: argument 1 requires timestamp type, however, 'timestamp' is of bigint type.; line 1 pos 0;
@the-other-tim-brown - additional context is the column name is "timestamp" which is used as a partition transform at the "hour" level. Note, when I query the partitions they appear to be stored as unix hours, but the logical partitions in S3 are formed as "yyyy-mm-dd-HH"
@the-other-tim-brown yeah here is the error:
2024-03-05 20:30:06 ERROR io.onetable.spi.sync.TableFormatSync:78 - Failed to sync snapshot org.apache.spark.sql.AnalysisException: cannot resolve 'date_format(timestamp, 'yyyy-MM-dd-HH')' due to data type mismatch: argument 1 requires timestamp type, however, 'timestamp' is of bigint type.; line 1 pos 0;
Is this the full stacktrace? If so I'll need to investigate why we're not getting more error content in the logs.
@the-other-tim-brown that's not the full stack trace, just the error. The machine the program is on is separate and not able to post here. So I have to type alot of this in by hand.
Perhaps you could just let me know - does OneTable have the ability to take a timestamp field, stored as a timestamp, that uses the hour transform, from an iceberg source and convert that to delta, or is that not supported? I guess this is what I really want to know.
This is definitely a bug somewhere in the partition handling logic. We want to support this case so it just requires developing an end to end test case in the repo to reproduce the issue in a way that is easier to debug.
@the-other-tim-brown I was able to get the full stack trace printed out of the machine. I made most of the fields random names like "f1, f2 ... "except for the field the data is partitioned on using the hour transform from iceberg (the one that is failing). I also redacted the s3 path names.
Just to restate, im going from an existing iceberg table, in aws using s3, and glue - to delta (not databricks delta). The existing iceberg table was created by athena.
Take a look at let me know what you think or if anything stands out! Any help would be awesome. Other partition functions work fine. The project is really awesome!
2024-03-08 21:24:59 INFO org.apache.spark.sql.delta.DeltaLog:57 - No delta log found for the Delta table at s3://<redacted>
2024-03-08 21:24:59 INFO org.apache.spark.sql.delta.InitialSnapshot:57 - [tableId=0e175eb3-ad57-4432-8d71-cd3d58b839a2] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(081118d2-4985-476f-9za6-07b80359891c,null,null,Format(parquet,Map()),null,List(),Map(),Some(1709933099519)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),List(),None,-1), checksumOpt=None)
2024-03-08 21:25:03 ERROR io.onetable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.spark.sql.AnalysisException: cannot resolve 'date_format(timestamp, 'yyyy-MM-dd-HH')' due to data type mismatch: argument 1 requires timestamp type, however, 'timestamp' is of bigint type.; line 1 pos
0;
'Project [date_format(timestamp#91L, yyyy-MM-dd-HH, Some(Etc/UTC)) AS onetable_partition_col_HOUR_timestamp#135]
+- LocalRelation <empty>, [timestamp#91L, f0#92, f1#93, f2#94, f14#95, f11#96, f4#97, f13#98, f5#99, f6#100, f7#101L, f8#102L, f9#103, f10#104, f11#105, f12#106L, f13#107, f14#108, f15#109L, f16#110, f17#111L, f18#112]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1122) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.List.foreach(List.scala:392) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.map(TraversableLike.scala:238) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.map$(TraversableLike.scala:231) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.List.map(List.scala:298) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:182) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:205) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:202) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:86) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:78) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:90) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3734) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset.select(Dataset.scala:1454) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.GeneratedColumn$.validateGeneratedColumns(GeneratedColumn.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$verifyNewMetadata$2(OptimisticTransaction.scala:431) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.verifyNewMetadata(OptimisticTransaction.scala:431) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.verifyNewMetadata$(OptimisticTransaction.scala:402) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.verifyNewMetadata(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal(OptimisticTransaction.scala:382) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal$(OptimisticTransaction.scala:296) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadataInternal(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata(OptimisticTransaction.scala:289) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata$(OptimisticTransaction.scala:284) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadata(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.commitTransaction(DeltaClient.java:259) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.access$300(DeltaClient.java:215) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient.completeSync(DeltaClient.java:198) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:179) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.sync(OneTableClient.java:116) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.utilities.RunSync.main(RunSync.java:168) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
2024-03-08 21:25:03 ERROR io.onetable.client.OneTableClient:133 - Sync failed for the following formats DELTA
Can you confirm the field schema type in the iceberg metadata and in the parquet file metadata? I tried a local example with hour partitioning on a timestamp field and it worked. I'm wondering if the field is actually a long in the schema.
@the-other-tim-brown so, i rebuilt the table again, to make sure that there wasent something odd with the partitions. Its odd, the only change i made when rebuilding the table was using upper case PARTITION BY HOUR(timestamp)
, but ... now im getting a different error:
Output: 2024-03-09 00:06:16 INFO io.onetable.utilities.RunSync:147 - Running sync for basePath s3://<redacted> for following table formats [DELTA]
2024-03-09 00:06:17 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-09 00:06:19 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-03-09 00:06:19 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:57 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-03-09 00:06:20 INFO org.apache.spark.sql.delta.DeltaLog:57 - Creating initial snapshot without metadata, because the directory is empty
2024-03-09 00:06:20 INFO org.apache.spark.sql.delta.InitialSnapshot:57 - [tableId=cafeb292-12a0-4a27-9382-d436a0977e46] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(ad0d3e83-149a-4904-afd8-8d655b49f1f8,null,null,Format(parquet,Map()),null,List(),Map(),Some(1709942780618)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),List(),None,-1), checksumOpt=None)
2024-03-09 00:06:20 INFO io.onetable.client.OneTableClient:234 - No previous OneTable sync for target. Falling back to snapshot sync.
2024-03-09 00:06:21 INFO org.apache.iceberg.CatalogUtil:302 - Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
2024-03-09 00:06:22 INFO org.apache.iceberg.BaseMetastoreTableOperations:199 - Refreshing table metadata from new version: s3://<redacted>/metadata/00000-585a7096-6160-4bc7-a3d9-a60634380092.metadata.json
2024-03-09 00:06:22 INFO org.apache.iceberg.BaseMetastoreCatalog:67 - Table loaded by catalog: onetable.one_table_demo.bluvector_sensor_zeek_connection_log_dev_test_v3
2024-03-09 00:06:22 INFO org.apache.iceberg.SnapshotScan:124 - Scanning table onetable.one_table_demo.bluvector_sensor_zeek_connection_log_dev_test_v3 snapshot 802574091683561956 created at 2024-03-08T22:31:16.976+00:00 with filter true
2024-03-09 00:06:23 INFO org.apache.iceberg.metrics.LoggingMetricsReporter:38 - Received metrics report: ScanReport{tableName=onetable.one_table_demo.bluvector_sensor_zeek_connection_log_dev_test_v3, snapshotId=802574091683561956, filter=true, schemaId=0, projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], projectedFieldNames=[timestamp, uid, sensor_name,
source_ip, source_port, destination_ip, destination_port, proto, service, duration, source_bytes, destination_bytes, conn_state, local_source, local_destination, missed_bytes, history, source_pkts, source_ip_bytes, destination_pkts, destination_ip_bytes, tunnel_parents], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.606986723S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=116}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=CounterResult{unit=BYTES, value=3635593429}, totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=0}, skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.4.2 (commit f6bb9173b13424d77e7ad8439b5ef9627e530cb2)}}
2024-03-09 00:06:23 ERROR io.onetable.utilities.RunSync:170 - Error running sync for s3://<redacted>
java.lang.NullPointerException: null
at io.onetable.iceberg.IcebergPartitionValueConverter.toOneTable(IcebergPartitionValueConverter.java:117) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.iceberg.IcebergSourceClient.fromIceberg(IcebergSourceClient.java:165) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.iceberg.IcebergSourceClient.getCurrentSnapshot(IcebergSourceClient.java:147) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:35) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:177) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.sync(OneTableClient.java:116) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.utilities.RunSync.main(RunSync.java:168) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
@ForeverAngry can you grab the schema of the Iceberg table?
@the-other-tim-brown Here it is, let me know if this wasnt what you were looking for!
{
"format-version" : 2,
"table-uuid" : "03e38c37-0ac5-4cae-a66f-49038931f265",
"location" : "s3://<redacted>",
"last-sequence-number" : 114,
"last-updated-ms" : 1709948427780,
"last-column-id" : 22,
"current-schema-id" : 0,
"schemas" : [ {
"type" : "struct",
"schema-id" : 0,
"fields" : [ {
"id" : 1,
"name" : "timestamp",
"required" : false,
"type" : "timestamp"
}, {
"id" : 2,
"name" : "uid",
"required" : false,
"type" : "string"
}, {
"id" : 3,
"name" : "sensor_name",
"required" : false,
"type" : "string"
}, {
"id" : 4,
"name" : "source_ip",
"required" : false,
"type" : "string"
}, {
"id" : 5,
"name" : "source_port",
"required" : false,
"type" : "int"
}, {
"id" : 6,
"name" : "destination_ip",
"required" : false,
"type" : "string"
}, {
"id" : 7,
"name" : "destination_port",
"required" : false,
"type" : "int"
}, {
"id" : 8,
"name" : "proto",
"required" : false,
"type" : "string"
}, {
"id" : 9,
"name" : "service",
"required" : false,
"type" : "string"
}, {
"id" : 10,
"name" : "duration",
"required" : false,
"type" : "double"
}, {
"id" : 11,
"name" : "source_bytes",
"required" : false,
"type" : "long"
}, {
"id" : 12,
"name" : "destination_bytes",
"required" : false,
"type" : "long"
}, {
"id" : 13,
"name" : "conn_state",
"required" : false,
"type" : "string"
}, {
"id" : 14,
"name" : "local_source",
"required" : false,
"type" : "string"
}, {
"id" : 15,
"name" : "local_destination",
"required" : false,
"type" : "string"
}, {
"id" : 16,
"name" : "missed_bytes",
"required" : false,
"type" : "long"
}, {
"id" : 17,
"name" : "history",
"required" : false,
"type" : "string"
}, {
"id" : 18,
"name" : "source_pkts",
"required" : false,
"type" : "int"
}, {
"id" : 19,
"name" : "source_ip_bytes",
"required" : false,
"type" : "long"
}, {
"id" : 20,
"name" : "destination_pkts",
"required" : false,
"type" : "int"
}, {
"id" : 21,
"name" : "destination_ip_bytes",
"required" : false,
"type" : "long"
}, {
"id" : 22,
"name" : "tunnel_parents",
"required" : false,
"type" : "string"
} ]
} ],
"default-spec-id" : 0,
"partition-specs" : [ {
"spec-id" : 0,
"fields" : [ {
"name" : "timestamp_hour",
"transform" : "hour",
"source-id" : 1,
"field-id" : 1000
} ]
} ],
"last-partition-id" : 1000,
"default-sort-order-id" : 0,
"sort-orders" : [ {
"order-id" : 0,
"fields" : [ ]
} ],
"properties" : {
"history.expire.max-snapshot-age-ms" : "259200000",
"athena.optimize.rewrite.min-data-file-size-bytes" : "50000000",
"write.delete.parquet.compression-codec" : "GZIP",
"write.format.default" : "PARQUET",
"write.parquet.compression-codec" : "GZIP",
"write.binpack.min-file-size-bytes" : "50000000",
"write.object-storage.enabled" : "true",
"write.object-storage.path" : "s3://<redacted>/data"
},
Thanks, this is what I was looking for. It lines up with my expectations but I'm not able to reproduce the same error that you saw. Are any of your partitions null or before 1970-01-01? Just trying to think through the common time-based edge cases here.
@the-other-tim-brown So the table had some null values in the timestamp. I got rid of those. Now im back to the original error i started with.
The timestamps include the millisecond, but do not have timezone. Do you think the millisecond precision is an issue?
Also, maybe - can you tell me what Jar versions you are using when you run your test? Maybe mine are causing an issue.
Here is the stack:
Output: 2024-03-10 02:42:51 INFO io.onetable.utilities.RunSync:147 - Running sync for basePath s3://<redacted> for following table formats [DELTA]
2024-03-10 02:42:51 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-10 02:42:53 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-03-10 02:42:54 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:57 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-03-10 02:42:55 INFO org.apache.spark.sql.delta.DeltaLog:57 - Creating initial snapshot without metadata, because the directory is empty
2024-03-10 02:42:55 INFO org.apache.spark.sql.delta.InitialSnapshot:57 - [tableId=5ae55a82-f906-438d-8150-a4a8bbec1bc0] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(7a370e68-bd4f-4908-8893-6bd01b5969b3,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710038575593)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),List(),None,-1), checksumOpt=None)
2024-03-10 02:42:55 INFO io.onetable.client.OneTableClient:234 - No previous OneTable sync for target. Falling back to snapshot sync.
2024-03-10 02:42:56 INFO org.apache.iceberg.CatalogUtil:302 - Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
2024-03-10 02:42:57 INFO org.apache.iceberg.BaseMetastoreTableOperations:199 - Refreshing table metadata from new version: s3://<redacted>/metadata/00116-d628cb20-f502-445e-a87a-b0639b830259.metadata.json
2024-03-10 02:42:58 INFO org.apache.iceberg.BaseMetastoreCatalog:67 - Table loaded by catalog: onetable.titanflow_laker_trusted.bluvector_sensor_zeek_connection_log_dev
2024-03-10 02:42:58 INFO org.apache.iceberg.SnapshotScan:124 - Scanning table onetable.titanflow_laker_trusted.bluvector_sensor_zeek_connection_log_dev snapshot 8863237097523144183 created at 2024-03-10T02:42:10.959+00:00 with filter true
2024-03-10 02:42:59 INFO org.apache.iceberg.metrics.LoggingMetricsReporter:38 - Received metrics report: ScanReport{tableName=onetable.titanflow_laker_trusted.bluvector_sensor_zeek_connection_log_dev, snapshotId=8863237097523144183, filter=true, schemaId=0, projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], projectedFieldNames=[timestamp, uid, sensor_name, source_ip, source_port, destination_ip, destination_port, proto, service, duration, source_bytes, destination_bytes, conn_state, local_source, local_destination, missed_bytes, history, source_pkts, source_ip_bytes, destination_pkts, destination_ip_bytes, tunnel_parents], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.98888014S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=456}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=71}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=58}, skippedDataManifests=CounterResult{unit=COUNT, value=13}, totalFileSizeInBytes=CounterResult{unit=BYTES, value=13005026723}, totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=0}, skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.4.2 (commit f6bb9173b13424d77e7ad8439b5ef9627e530cb2)}}
2024-03-10 02:42:59 INFO org.apache.spark.sql.delta.DeltaLog:57 - No delta log found for the Delta table at s3://<redacted>/data/_delta_log
2024-03-10 02:42:59 INFO org.apache.spark.sql.delta.InitialSnapshot:57 - [tableId=7a370e68-bd4f-4908-8893-6bd01b5969b3] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(884081c4-c241-417c-858a-ef3714fa54dc,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710038579829)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),List(),None,-1), checksumOpt=None)
2024-03-10 02:43:05 ERROR io.onetable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.spark.sql.AnalysisException: cannot resolve 'date_format(timestamp, 'yyyy-MM-dd-HH')' due to data type mismatch: argument 1 requires timestamp type, however, 'timestamp' is of bigint type.; line 1 pos 0;
'Project [date_format(timestamp#91L, yyyy-MM-dd-HH, Some(Etc/UTC)) AS onetable_partition_col_HOUR_timestamp#135]
+- LocalRelation <empty>, [timestamp#91L, uid#92, sensor_name#93, source_ip#94, source_port#95, destination_ip#96, destination_port#97, proto#98, service#99, duration#100, source_bytes#101L, destination_bytes#102L, conn_state#103, local_source#104, local_destination#105, missed_bytes#106L, history#107, source_pkts#108, source_ip_bytes#109L, destination_pkts#110, destination_ip_bytes#111L, tunnel_parents#112]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1122) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.List.foreach(List.scala:392) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.map(TraversableLike.scala:238) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.TraversableLike.map$(TraversableLike.scala:231) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.List.map(List.scala:298) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:182) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:205) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:202) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:86) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:78) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:90) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:88) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3734) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.Dataset.select(Dataset.scala:1454) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.GeneratedColumn$.validateGeneratedColumns(GeneratedColumn.scala:196) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$verifyNewMetadata$2(OptimisticTransaction.scala:431) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.verifyNewMetadata(OptimisticTransaction.scala:431) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.verifyNewMetadata$(OptimisticTransaction.scala:402) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.verifyNewMetadata(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal(OptimisticTransaction.scala:382) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal$(OptimisticTransaction.scala:296) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadataInternal(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata(OptimisticTransaction.scala:289) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata$(OptimisticTransaction.scala:284) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadata(OptimisticTransaction.scala:100) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.commitTransaction(DeltaClient.java:259) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.access$300(DeltaClient.java:215) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient.completeSync(DeltaClient.java:198) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:179) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.sync(OneTableClient.java:116) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.utilities.RunSync.main(RunSync.java:168) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
2024-03-10 02:43:05 ERROR io.onetable.client.OneTableClient:133 - Sync failed for the following formats DELTA
@ForeverAngry can you confirm that you are on the latest main branch when testing? If so, can you rebuild off of this branch and test again? https://github.com/apache/incubator-xtable/pull/382 I think there is some issue in how we're handling timestamps with/without timezone set.
@the-other-tim-brown i re-pulled the main branch again, and rebuilt the Jar. Note - i noticed that the run sync class changed from io.onetable.utiliities.RunSync
to org.apache.xtable.utilities.RunSync
, just an fyi - i wasn't sure if this was expected or not. Anyway - looks like im still getting the same error. Let me know what you think.
Output: 2024-03-10 13:06:33 INFO org.apache.xtable.utilities.RunSync:147 - Running sync for basePath s3://<redacted> for following table formats [DELTA]
2024-03-10 13:06:35 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-10 13:06:39 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-03-10 13:06:40 WARN org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-03-10 13:06:40 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-03-10 13:06:42 INFO org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-03-10 13:06:44 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=37adbca2-cc8a-4d2f-a2b2-5a891368c9a5] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(608fba2e-531b-455c-8a92-e14f1086a557,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710076004533)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-03-10 13:06:44 INFO org.apache.xtable.client.OneTableClient:234 - No previous OneTable sync for target. Falling back to snapshot sync.
2024-03-10 13:06:45 INFO org.apache.iceberg.CatalogUtil:302 - Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
2024-03-10 13:06:46 INFO org.apache.iceberg.BaseMetastoreTableOperations:199 - Refreshing table metadata from new version: s3://<redacted>/metadata/00116-d628cb20-f502-445e-a87a-b0639b830259.metadata.json
2024-03-10 13:06:47 INFO org.apache.iceberg.BaseMetastoreCatalog:67 - Table loaded by catalog: onetable.<redacted>
2024-03-10 13:06:47 INFO org.apache.iceberg.SnapshotScan:124 - Scanning table onetable.<redacted> snapshot 8863237097523144183 created at 2024-03-10T02:42:10.959+00:00 with filter true
2024-03-10 13:06:48 INFO org.apache.iceberg.metrics.LoggingMetricsReporter:38 - Received metrics report: ScanReport{tableName=onetable.<redacted>, snapshotId=8863237097523144183, filter=true, schemaId=0, projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], projectedFieldNames=[timestamp, uid, sensor_name, source_ip, source_port, destination_ip, destination_port, proto, service, duration, source_bytes, destination_bytes, conn_state, local_source, local_destination, missed_bytes, history, source_pkts, source_ip_bytes, destination_pkts, destination_ip_bytes, tunnel_parents], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT1.291327918S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=456}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=71}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=58}, skippedDataManifests=CounterResult{unit=COUNT, value=13}, totalFileSizeInBytes=CounterResult{unit=BYTES, value=13005026723}, totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=0}, skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.4.2 (commit f6bb9173b13424d77e7ad8439b5ef9627e530cb2)}}
2024-03-10 13:06:49 INFO org.apache.spark.sql.delta.DeltaLog:60 - No delta log found for the Delta table at s3://<redacted>/data/_delta_log
2024-03-10 13:06:49 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=608fba2e-531b-455c-8a92-e14f1086a557] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(6b778d50-11e8-4490-9892-2d971f078703,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710076009117)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-03-10 13:06:59 ERROR org.apache.xtable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
org.apache.spark.sql.AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "date_format(timestamp, yyyy-MM-dd-HH)" due to data type mismatch: Parameter 1 requires the "TIMESTAMP" type, however "timestamp" has the type "BIGINT".; line 1 pos 0
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:269) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:256) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:294) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:294) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.Iterator.foreach(Iterator.scala:943) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:294) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$4(CheckAnalysis.scala:256) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$4$adapted(CheckAnalysis.scala:256) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.Stream.foreach(Stream.scala:533) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:256) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:163) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:160) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:156) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:146) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.GeneratedColumn$.validateColumnReferences(GeneratedColumn.scala:185) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.GeneratedColumn$.$anonfun$validateGeneratedColumns$2(GeneratedColumn.scala:214) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.collection.immutable.List.map(List.scala:293) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.GeneratedColumn$.validateGeneratedColumns(GeneratedColumn.scala:210) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$assertMetadata$2(OptimisticTransaction.scala:611) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:140) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:138) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:133) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:128) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:117) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:132) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:122) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:112) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.assertMetadata(OptimisticTransaction.scala:611) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.assertMetadata$(OptimisticTransaction.scala:583) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.assertMetadata(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal(OptimisticTransaction.scala:565) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadataInternal$(OptimisticTransaction.scala:395) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadataInternal(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata(OptimisticTransaction.scala:379) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransactionImpl.updateMetadata$(OptimisticTransaction.scala:372) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.spark.sql.delta.OptimisticTransaction.updateMetadata(OptimisticTransaction.scala:137) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.delta.DeltaClient$TransactionState.commitTransaction(DeltaClient.java:261) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.delta.DeltaClient$TransactionState.access$300(DeltaClient.java:217) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.delta.DeltaClient.completeSync(DeltaClient.java:199) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.client.OneTableClient.syncSnapshot(OneTableClient.java:179) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.client.OneTableClient.sync(OneTableClient.java:116) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at org.apache.xtable.utilities.RunSync.main(RunSync.java:168) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
2024-03-10 13:06:59 ERROR org.apache.xtable.client.OneTableClient:133 - Sync failed for the following formats DELTA
@ForeverAngry the change in paths is expected and required as we transition this project to the Apache Software Foundation. We will need to do a full revamp of the docs/references once this process is complete.
Did you try building off of the branch I linked as well?
@the-other-tim-brown Well, I feel a bit foolish - for some reason I thought you only wanted me to re pull and test the main branch. Sorry, I'll pull that branch you linked and let you know!
Again, I appreciate the dialog and support - your awesome!
@the-other-tim-brown Okay, i rebuilt the jar using the branch you provided, here is the stack trace:
Output: 2024-03-10 15:59:14 INFO io.onetable.utilities.RunSync:147 - Running sync for basePath s3://<redacted> for following table formats [DELTA]
2024-03-10 15:59:15 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-10 15:59:21 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-03-10 15:59:22 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3`
2024-03-10 15:59:23 INFO org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-03-10 15:59:25 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=f5cd0f71-bf22-4b57-8447-b5520c7b7d99] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(23f44368-3cd6-451f-af9a-d87c7cd8ee9a,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710086365401)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-03-10 15:59:25 INFO io.onetable.client.OneTableClient:234 - No previous OneTable sync for target. Falling back to snapshot sync.
2024-03-10 15:59:26 INFO org.apache.iceberg.CatalogUtil:302 - Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
2024-03-10 15:59:27 INFO org.apache.iceberg.BaseMetastoreTableOperations:199 - Refreshing table metadata from new version: s3://<redacted>/metadata/00116-d628cb20-f502-445e-a87a-b0639b830259.metadata.json
2024-03-10 15:59:27 INFO org.apache.iceberg.BaseMetastoreCatalog:67 - Table loaded by catalog: onetable.<redacted>
2024-03-10 15:59:27 INFO org.apache.iceberg.SnapshotScan:124 - Scanning table onetable.<redacted> snapshot 8863237097523144183 created at 2024-03-10T02:42:10.959+00:00 with filter true
2024-03-10 15:59:29 INFO org.apache.iceberg.metrics.LoggingMetricsReporter:38 - Received metrics report: ScanReport{tableName=onetable.<redacted>, snapshotId=8863237097523144183, filter=true, schemaId=0, projectedFieldIds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], projectedFieldNames=[timestamp, uid, sensor_name, source_ip, source_port, destination_ip, destination_port, proto, service, duration, source_bytes, destination_bytes, conn_state, local_source, local_destination, missed_bytes, history, source_pkts, source_ip_bytes, destination_pkts, destination_ip_bytes, tunnel_parents], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT1.189127061S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=456}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=71}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=58}, skippedDataManifests=CounterResult{unit=COUNT, value=13}, totalFileSizeInBytes=CounterResult{unit=BYTES, value=13005026723}, totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=0}, skippedDataFiles=CounterResult{unit=COUNT, value=0}, skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, scannedDeleteManifests=CounterResult{unit=COUNT, value=0}, skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, indexedDeleteFiles=CounterResult{unit=COUNT, value=0}, equalityDeleteFiles=CounterResult{unit=COUNT, value=0}, positionalDeleteFiles=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.4.2 (commit f6bb9173b13424d77e7ad8439b5ef9627e530cb2)}}
2024-03-10 15:59:29 INFO org.apache.spark.sql.delta.DeltaLog:60 - No delta log found for the Delta table at s3://<redacted>/data/_delta_log
2024-03-10 15:59:29 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=23f44368-3cd6-451f-af9a-d87c7cd8ee9a] Created snapshot InitialSnapshot(path=s3://<redacted>/data/_delta_log, version=-1, metadata=Metadata(f152cc9d-0cd4-4692-bc95-68866870b6ef,null,null,Format(parquet,Map()),null,List(),Map(),Some(1710086369624)), logSegment=LogSegment(s3://<redacted>/data/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-03-10 15:59:29 ERROR io.onetable.spi.sync.TableFormatSync:78 - Failed to sync snapshot
io.onetable.exception.NotSupportedException: Unsupported type: timestamp_ntz
at io.onetable.delta.DeltaSchemaExtractor.toOneSchema(DeltaSchemaExtractor.java:274) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaSchemaExtractor.lambda$toOneSchema$9(DeltaSchemaExtractor.java:203) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_342]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_342]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_342]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_342]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_342]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_342]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_342]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_342]
at io.onetable.delta.DeltaSchemaExtractor.toOneSchema(DeltaSchemaExtractor.java:217) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaSchemaExtractor.toOneSchema(DeltaSchemaExtractor.java:145) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.addColumn(DeltaClient.java:242) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient$TransactionState.access$200(DeltaClient.java:217) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.delta.DeltaClient.syncPartitionSpec(DeltaClient.java:170) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:158) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:179) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.client.OneTableClient.sync(OneTableClient.java:116) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
at io.onetable.utilities.RunSync.main(RunSync.java:168) [utilities-0.1.0-SNAPSHOT-bundled.jar:?]
2024-03-10 15:59:29 ERROR io.onetable.client.OneTableClient:133 - Sync failed for the following formats DELTA
@ForeverAngry I missed a spot but just updated the branch to handle this error you are seeing.