Ajantha Bhat
Ajantha Bhat
GC identify has two steps. 1. to identify live contents and fill to bloom filter 2. Identify expired contents. currently for step 2, `commitProtectionDuration` is used to avoid the new...
to benchmark #3421 and create a procedure for benchmarking the GC logic. We should do it before a critical change is introduced to a GC component. It depends on: https://github.com/projectnessie/nessie/issues/3764...
In Nessie catalog, Each branch can have a new table with the same name as a table in another branch. When this happens, Nessie is using the same table path...
Scenario: `CREATE TABLE %s (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2')` `ALTER TABLE %s DROP PARTITION FIELD category` `ALTER TABLE %s DROP COLUMN...
Environment: - Nessie 0.43.0 (running by `./gradlew quarkusDev` ) - Iceberg 0.14.1 plus custom code to send huge metadata if the table name contains 'big' keyword (https://github.com/ajantha-bhat/iceberg/commit/c30b5412f8e29a7da88c4d2562c5ae7e2b7dd68f) - Spark3.3 Query:...
In Iceberg, `TableMetadata` grows at an average rate of 1KB per commit because of the `snasphots`, `snapshot-log`, `metadata-log` (default maximum is 100 entries for metadata-log) fields. For non-Nessie catalogs, `expire_snapshots`...
In `IdentifyLiveContents`, we are adding the "dropped table" files to live contents when a table's dropped time is newer than cutoff time /cutoff of commit. Which leads to not cleaning...
only merge is supported from SQL extensions. So, we can support Cherry-pick or transplant from SQL similar to python CLI
Current Arrow integration with carbon, support complex type and primitive type. And conversion from carbonInternalRow to Arrow vector happens at top layer. If Arrow vector is filled while rows are...