iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Issue in Reading Iceberg tables in Nessie + Minio using Pyiceberg

Open heman026 opened this issue 11 months ago • 8 comments

Question

I am getting "Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json" when loading table using PyIceberg Rest Catalog. I am using Nessie catalog and minio for storage.

catalog = load_catalog("rest",
    **{
        "uri": "http://10.55.134.161:19120/iceberg",
  "s3.endpoint": "http://10.55.134.161:9000",
        "warehouse": "warehouse",
        "s3.access-key-id": "minioadmin",
        "s3.secret-access-key": "minioadmin"
    },    )
con = catalog.load_table('test.emp')

Nessie Configuration Used:

java
-Dquarkus.management.port=9090
-Dnessie.version.store.type=JDBC
-Dquarkus.datasource.jdbc.url=jdbc:postgresql://localhost:5432/nessie_db
-Dquarkus.datasource.username=nessie
-Dquarkus.datasource.password=nessie
-Dnessie.catalog.default-warehouse=warehouse
-Dnessie.catalog.warehouses.warehouse.location=s3a://iceberg-datalake
-Dnessie.catalog.service.s3.default-options.endpoint=http://10.55.134.161:9000
-Dnessie.catalog.service.s3.default-options.path-style-access=true
-Dnessie.catalog.service.s3.default-options.access-key=minioadmin
-Dnessie.catalog.service.s3.default-options.secret-key=minioadmin
-Dnessie.server.authentication.enabled=false
-Dnessie.catalog.service.s3.default-options.region=us-east-1
-jar nessie-quarkus-0.100.2-runner.jar

Error

Exception has occurred: BadRequestError

IllegalArgumentException: java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json

File "C:\pyiceberg\catalog\rest.py", line 697, in load_table response.raise_for_status() requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://10.55.134.161:19120/iceberg/v1/main%7Cwarehouse/namespaces/test/tables/emp The above exception was the direct cause of the following exception: File "C:\pyiceberg\catalog\rest.py", line 476, in _handle_non_200_response raise exception(response) from exc File "C:\Hemanath\KAI\Iceberg Evaluation\Docker\duck\pyiceberg1\catalog\rest.py", line 699, in load_table self._handle_non_200_response(exc, {404: NoSuchTableError}) File "C:\duck.py", line 55, in con = catalog.load_table('test.emp') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pyiceberg1.exceptions.BadRequestError: IllegalArgumentException: java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to read table metadata from s3a://iceberg-datalake/test/emp_69182e21-1700-4317-9f75-55fca8d57979/metadata/00002-d7b6a027-3d3d-4a1f-9350-ce019969cc2e.metadata.json`

Note: I also disabled s3 request signing in Nessie (-Dnessie.catalog.service.s3.default-options.request-signing-enabled=false), but still getting the same error.

Please help me resolve this. Thanks

heman026 avatar Jan 22 '25 05:01 heman026

@heman026 Thanks for raising this issue. I'm not super familiar with Nessie, but I do notice that the warehouse configuration should be an s3 path: s3a://iceberg-datalake/

Fokko avatar Jan 22 '25 11:01 Fokko

Hi @Fokko

I have similar question on reading Iceberg table from nessie server

I set up nessie server locally, and I would like to access the Iceberg table.

Here is the output of response = requests.get("http://localhost:19120/api/v1/config", auth=HTTPBasicAuth("test-nessie", "test-nessie"))

Response JSON: {'defaultBranch': 'main', 'maxSupportedApiVersion': 2}

from pyiceberg.catalog import load_catalog
catalog = load_catalog(
    "nessie",
    uri= "http://localhost:19120/api",
    ref= "main",
    authentication={
        "type": "BASIC",
        "username": "test-nessie",
        "password": "test-nessie"
    }
)

This code cannot work due to 2 validation errors for ConfigResponse ConfigResponse is expect the format

class ConfigResponse(IcebergBaseModel):
    defaults: Properties = Field()
    overrides: Properties = Field()

However, the output of response is defaultBranch and maxSupportedApiVersion Any thoughts on how to read local server?

HungYangChang avatar Jan 27 '25 14:01 HungYangChang

@HungYangChang Check this https://github.com/apache/iceberg-python/issues/1524

heman026 avatar Jan 28 '25 04:01 heman026

good catch @heman026, did that resolve your issue?

kevinjqliu avatar Jan 28 '25 15:01 kevinjqliu

It is not working for me so far, I am still trying

My config:

print ("Initializing Nessie client...")
catalog = load_catalog(
    "rest",
    **{
        "uri": "http://10.3.120.105:19120/iceberg",
        "authentication.type": "BASIC",
        "authentication.username": "test-nessie",
        "authentication.password": "test-nessie",
    },
)
print("Set up correctly")

Still same error

pydantic_core._pydantic_core.ValidationError: 2 validation errors for ConfigResponse
defaults
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
overrides
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

I also try http://localhost:19120/iceberg, it doesn't work either

My config: pyiceberg=0.8.1

Btw, I can confirm nessie server is set up correctly: Image

under: http://localhost:19120/content/main/demo_0128_v3/names Image

HungYangChang avatar Jan 28 '25 16:01 HungYangChang

@HungYangChang i would recommend looking at nessie documentation on how to connect to an iceberg rest catalog. Pyiceberg accepts the standard iceberg rest catalog configurations https://py.iceberg.apache.org/configuration/#rest-catalog

kevinjqliu avatar Jan 28 '25 21:01 kevinjqliu

@HungYangChang did you resolve that? i have same issue

aliSadegh avatar Mar 07 '25 17:03 aliSadegh

This is my working setup. Configuration of Nessie might differ for you. https://github.com/apache/iceberg-python/issues/1524#issuecomment-2683847774 Can you check if this is working.

heman026 avatar Mar 11 '25 01:03 heman026