incubator-xtable icon indicating copy to clipboard operation
incubator-xtable copied to clipboard

Unable to create delta log metadata files for Iceberg table

Open Manoranjan93 opened this issue 1 year ago • 9 comments

We have created iceberg data in a GCS bucket, which includes a 'data' folder and a 'metadata' folder. I am attempting to generate delta metadata after cloning the 'onetable' repository. Although the process creates the '_delta_log' folder under base path of table, it does not generate any delta log files, and I am encountering the following error:

error: - image

my_config.yaml -

sourceFormat: ICEBERG targetFormats:

  • DELTA datasets:
  • tableBasePath: gs://iceberg_data/test.db/people tableDataPath: gs://iceberg_data/test.db/people/data tableName: people partitionSpec: date:VALUE
cloud run

Manoranjan93 avatar Jan 18 '24 09:01 Manoranjan93

@Manoranjan93 are you using a catalog for your Iceberg table? Also can you confirm whether that version hint file exists in GCS?

the-other-tim-brown avatar Jan 19 '24 15:01 the-other-tim-brown

@the-other-tim-brown - Yes, we are using a catalog for the Iceberg table. In GCS, under the table directory, there are two main folders: 'data' and 'metadata'. In the 'data' folder, there are partition folders, each containing Parquet files or other data files. In the 'metadata' folder, you can find the 'metadata.json' file along with two additional '.avro' files. Please find the sample folder structure in gcs

ls iceberg/ data/ metadata/

ls iceberg/data creation_date=2015–01–10/ creation_date=2015–01–11/

ls iceberg/metadata

00000–55gcadf-xxxx-xxx.metadata.json 0swscs-kcdedc-xxxx-xxxx-m0.avro snap-xxx-x-xxx.avro

Manoranjan93 avatar Jan 21 '24 08:01 Manoranjan93

@Manoranjan93 did you pass in the iceberg catalog configuration options for your source? If you are using the RunSync tool you can pass in -i catalog_options.yaml where that file is structured like:

catalogImpl: com.my.catalog.Impl
catalogName: name
catalogOptions:
  option1: value1
  option2: value2

the-other-tim-brown avatar Jan 23 '24 01:01 the-other-tim-brown

@the-other-tim-brown - I have already attempted to create a catalog.yaml file for source, as highlighted below, for the Iceberg table, and I am able to generate a delta log file. Could you please review the following scenarios and confirm if this is the expected behavior? Also, please verify if I am performing the steps correctly.

catalog.yaml file -

catalogImpl: org.apache.iceberg.gcp.biglake.BigLakeCatalog catalogName: iceberg catalogOptions: gcp_project: <yourProjectName> gcp_location: us-west1 warehouse: gs://path/to/warehouse

my_config.yaml

sourceFormat: ICEBERG targetFormats:

  • DELTA datasets:
  • tableBasePath: gs://bucket_name/db/table_name tableDataPath: gs://bucket_name/db/table_name/data tableName: table_name namespace: biglake_dataset_id partitionSpec: date:VALUE

I am using google cloud shell editor to run all the commands.

java -cp utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar:/home/fexxx/onetable/biglake-catalog-iceberg1.2.0-0.1.0-with-dependencies.jar io.onetable.utilities.RunSync --datasetConfig my_config.yaml --icebergCatalogConfig catalog.yaml

image

image

image

Manoranjan93 avatar Jan 23 '24 05:01 Manoranjan93

@Manoranjan93 is there content in the _delta_log/00...000.json file? The steps you are following are what I would expect but I have not tested this path with BLMS. If there are entries in the json, then you can try reading the table as delta from dataproc to validate it is readable.

the-other-tim-brown avatar Jan 23 '24 15:01 the-other-tim-brown

@the-other-tim-brown -- I have content in the _delta_log/0000..00.json

image

I will try to read the table as delta.

Manoranjan93 avatar Jan 25 '24 07:01 Manoranjan93

@Manoranjan93 were you able to get this working?

the-other-tim-brown avatar Mar 03 '24 02:03 the-other-tim-brown

@the-other-tim-brown Yes, I was able to read as a delta table using spark

Manoranjan93 avatar May 28 '24 17:05 Manoranjan93

@Manoranjan93 close issue?

alberttwong avatar Jun 04 '24 17:06 alberttwong