incubator-xtable
incubator-xtable copied to clipboard
Unable to create delta log metadata files for Iceberg table
We have created iceberg data in a GCS bucket, which includes a 'data' folder and a 'metadata' folder. I am attempting to generate delta metadata after cloning the 'onetable' repository. Although the process creates the '_delta_log' folder under base path of table, it does not generate any delta log files, and I am encountering the following error:
error: -
my_config.yaml -
sourceFormat: ICEBERG targetFormats:
- DELTA datasets:
- tableBasePath: gs://iceberg_data/test.db/people tableDataPath: gs://iceberg_data/test.db/people/data tableName: people partitionSpec: date:VALUE
@Manoranjan93 are you using a catalog for your Iceberg table? Also can you confirm whether that version hint file exists in GCS?
@the-other-tim-brown - Yes, we are using a catalog for the Iceberg table. In GCS, under the table directory, there are two main folders: 'data' and 'metadata'. In the 'data' folder, there are partition folders, each containing Parquet files or other data files. In the 'metadata' folder, you can find the 'metadata.json' file along with two additional '.avro' files. Please find the sample folder structure in gcs
ls iceberg/ data/ metadata/
ls iceberg/data creation_date=2015–01–10/ creation_date=2015–01–11/
ls iceberg/metadata
00000–55gcadf-xxxx-xxx.metadata.json 0swscs-kcdedc-xxxx-xxxx-m0.avro snap-xxx-x-xxx.avro
@Manoranjan93 did you pass in the iceberg catalog configuration options for your source? If you are using the RunSync tool you can pass in -i catalog_options.yaml where that file is structured like:
catalogImpl: com.my.catalog.Impl
catalogName: name
catalogOptions:
option1: value1
option2: value2
@the-other-tim-brown - I have already attempted to create a catalog.yaml file for source, as highlighted below, for the Iceberg table, and I am able to generate a delta log file. Could you please review the following scenarios and confirm if this is the expected behavior? Also, please verify if I am performing the steps correctly.
catalog.yaml file -
catalogImpl: org.apache.iceberg.gcp.biglake.BigLakeCatalog catalogName: iceberg catalogOptions: gcp_project: <yourProjectName> gcp_location: us-west1 warehouse: gs://path/to/warehouse
my_config.yaml
sourceFormat: ICEBERG targetFormats:
- DELTA datasets:
- tableBasePath: gs://bucket_name/db/table_name tableDataPath: gs://bucket_name/db/table_name/data tableName: table_name namespace: biglake_dataset_id partitionSpec: date:VALUE
I am using google cloud shell editor to run all the commands.
java -cp utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar:/home/fexxx/onetable/biglake-catalog-iceberg1.2.0-0.1.0-with-dependencies.jar io.onetable.utilities.RunSync --datasetConfig my_config.yaml --icebergCatalogConfig catalog.yaml
@Manoranjan93 is there content in the _delta_log/00...000.json file? The steps you are following are what I would expect but I have not tested this path with BLMS. If there are entries in the json, then you can try reading the table as delta from dataproc to validate it is readable.
@the-other-tim-brown -- I have content in the _delta_log/0000..00.json
I will try to read the table as delta.
@Manoranjan93 were you able to get this working?
@the-other-tim-brown Yes, I was able to read as a delta table using spark
@Manoranjan93 close issue?