nessie icon indicating copy to clipboard operation
nessie copied to clipboard

[Bug]: S3 paths with special characters are not escaped

Open c-thiel opened this issue 11 months ago • 7 comments

What happened

Maybe related to https://github.com/projectnessie/nessie/issues/8524

In our deployment we had a user that created a namespace with a questionmark in its name. Since then we cannot run gc anymore. I managed to reproduce it in our tests. GC It fails with:

Caused by: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit 4696d1290522386c8ee4db51c64c1fd3ed567351a0026a5fefb6154faa1629d2 via s3://wyrm-lakehouse-pytest/tests.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
...
Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://wyrm-lakehouse-pytest/tests.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/snap-6014480353781191283-1-fe723339-fd23-4db3-a534-95f1de851af9.avro

The file exists. It seems to me that the ? is not escaped, and thus the rest of the path is interpreted as a query argument.

Is there a flag to make gc altogether more robust? I don't want to abort it halfway through due to a single table failure.

How to reproduce it

  1. Create a namespace "a" with a subnamespace "C?mpl3 Name " (I used spark)
  2. Create a table in it called "table"
  3. Drop the table
  4. Run Nessie gc

Nessie server type (docker/uber-jar/built from source) and version

docker

Client type (Ex: UI/Spark/pynessie ...) and version

None - Nessie gc

Additional information

Nessie 0.99.0

c-thiel avatar Jan 05 '25 10:01 c-thiel

Sorry, @c-thiel , I was not able to reproduce this problem with the current main code (local build) :thinking:

Could you share your full exception stack trace and exact gc CLI arguments (minus secret stuff, of course)?

dimas-b avatar Jan 06 '25 16:01 dimas-b

Hm, strange.

Here is the callstack:

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
        at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:540)
        at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
        at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:653)
        at java.base/java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2822)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expire(DefaultLocalExpire.java:75)
        at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.expire(BaseRepositoryCommand.java:250)
        at org.projectnessie.gc.tool.cli.commands.MarkAndSweep.call(MarkAndSweep.java:57)
        at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.call(BaseRepositoryCommand.java:84)
        at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:26)
        at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:21)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:104)
        at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:75)
        at org.projectnessie.gc.tool.cli.CLI.main(CLI.java:70)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
        at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:540)
        at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
        at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:657)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireInForkJoinPool(DefaultLocalExpire.java:96)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1456)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name .table, content-ID 0749c4e9-ac98-4c82-ad2f-1bc387f5045f at commit f150347568bc34bfd27b611d6588a00bf472e7bd76c477b3b96864deddf3ed5d via s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/00000-6cc479b6-0667-4c01-93e8-d59ed46c4c4c.metadata.json
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.lambda$extractTableFiles$0(IcebergContentToFiles.java:170)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
        at java.base/java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:411)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:310)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
        at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1715)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
        at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:258)
        at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:248)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.count(ReferencePipeline.java:709)
        at org.projectnessie.gc.expire.PerContentDeleteExpired.identifyLiveFiles(PerContentDeleteExpired.java:136)
        at org.projectnessie.gc.expire.PerContentDeleteExpired.expire(PerContentDeleteExpired.java:73)
        at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireSingleContent(DefaultLocalExpire.java:109)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934)
        at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
        ... 5 more
Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://my-bucket/tests.2023-10-13-101415-edd293e4-1eee-47d7-b964-171340e13ae9.a.C?mpl Name /table_2559f350-e7c7-4216-b29e-5033091a4d06/metadata/snap-6014480353781191283-1-fe723339-fd23-4db3-a534-95f1de851af9.avro
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:37)
        at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
        at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:242)
        at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45)
        at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:146)
        at org.apache.iceberg.BaseSnapshot.allManifests(BaseSnapshot.java:164)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.allManifests(IcebergContentToFiles.java:268)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.allManifestsAndDataFiles(IcebergContentToFiles.java:254)
        at org.projectnessie.gc.iceberg.IcebergContentToFiles.lambda$extractTableFiles$0(IcebergContentToFiles.java:155)
        ... 37 more
Caused by: org.apache.avro.InvalidAvroMagicException: Not an Avro data file
        at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:78)
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:102)
        ... 47 more

The gc command is:

        - gc
        - --uri
        - http://....:19120/api/v2
        - --inmemory
        - --default-cutoff
        - PT80H
        - --iceberg
        - $(S3_CREDENTIALS)

The credential has access to the file. I am on Version 0.99 - if main doesn't have the problem, we could just try an update and see if the problem persists.

c-thiel avatar Jan 08 '25 12:01 c-thiel

@c-thiel : From the stack trace it looks like it is not a problem with S3 URI handling, but might be a problem with the file contents. Could you double check in S3 UI/CLI whether that avro file is correct?

dimas-b avatar Jan 08 '25 14:01 dimas-b

I tried Spark + AWS and created the top-level namespace with ? and a in the name.

Then Iceberg fails to insert on the Spark side.

spark-sql ()> create namespace `ns?test abc`;

then

use `ns?test abc`

then

spark-sql (`ns?test abc`)> create table t2 (n string)
                         > ;
Time taken: 1.609 seconds
spark-sql (`ns?test abc`)> insert into t2 values ('t2');
25/01/08 10:57:35 WARN SnapshotProducer: Failed to load committed table metadata or during cleanup, skipping further cleanup
org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: s3://EXAMPLE/ns?test abc/t2_11c36b10-a8c1-477a-8a91-b90733dad13f/metadata/snap-4983815506953660385-1-b650452a-4aba-45ae-b417-c9a3adc99462.avro
	at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104)
	at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
	at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:37)
	at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
	at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:258)
	at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45)
	at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:146)
	at org.apache.iceberg.BaseSnapshot.allManifests(BaseSnapshot.java:164)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:451)
	at org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:233)
	at org.apache.iceberg.spark.source.SparkWrite$BatchAppend.commit(SparkWrite.java:301)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:399)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:359)
	at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:225)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:337)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:336)
	at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:68)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.iceberg.shaded.org.apache.avro.InvalidAvroMagicException: Not an Avro data file
	at org.apache.iceberg.shaded.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:79)
	at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:102)
	... 74 more
Time taken: 2.957 seconds

So, my guess is that it's a bug in Iceberg code :shrug:

dimas-b avatar Jan 08 '25 15:01 dimas-b

@c-thiel : Does this use case work with a JDBC Iceberg catalog?

dimas-b avatar Jan 08 '25 16:01 dimas-b

@c-thiel : Does this use case work with a JDBC Iceberg catalog?

I haven't tried actually. Thanks for investigating @dimas-b.

I don't really care much about this specific use-case, but if someone manages to create a broken table (very small fraction), it would be great if Nessie would continue with gc instead of aborting. Not really sure how I can get the gc to run normally again without touching MongoDB

c-thiel avatar Jan 08 '25 16:01 c-thiel

Nessie GC uses Iceberg jars for metadata access very similarly to how Spark uses them.

So it is conceivable that, for example, Python might be able to write something that Iceberg java would not be able to read :shrug:

dimas-b avatar Jan 08 '25 17:01 dimas-b