iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Consolidate catalog behavior

Open kevinjqliu opened this issue 1 year ago • 4 comments

Feature Request / Improvement

We should ensure consistency in catalog behaviors for all implementations.

Currently, we have a few different catalog implementations (dynamodb, glue, hive, rest, and sql). Each catalog implementation supports a set of functions described in the Catalog ABC class (https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/init.py#L276)

The behavior for each catalog implementation should be standardized. And we should test that a set of behaviors is consistent across all catalog implementations.

Here are a few related issues

  • #742
  • #769
  • #507
  • #849

Add SqlCatalog to test console #488 is an example PR that utilizes tests/cli/test_console.py to ensure the same behaviors for different catalogs.

kevinjqliu avatar Jun 12 '24 00:06 kevinjqliu

Hii Kevin, Can I take up this issue?

rohithkanchukatla avatar Jul 09 '24 14:07 rohithkanchukatla

@rohithkanchukatla sure thing

kevinjqliu avatar Jul 09 '24 16:07 kevinjqliu

@kevinjqliu this can be closed as well as all tasks in this issue are completed.

soumya-ghosh avatar Aug 14 '24 12:08 soumya-ghosh

The behavior for each catalog implementation should be standardized. And we should test that a set of behaviors is consistent across all catalog implementations.

@soumya-ghosh I want to keep this issue open until we have a solution to address future updates. Ideally, there's a set of tests that will check the behavior of all the catalog implementations.

Add SqlCatalog to test console https://github.com/apache/iceberg-python/pull/488 is an example PR that utilizes tests/cli/test_console.py to ensure the same behaviors for different catalogs.

A possible solution is to test the input and expected output via the CLI

kevinjqliu avatar Aug 15 '24 00:08 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 12 '25 00:02 github-actions[bot]

We can delete mountains of code if this is done correctly....

What about a strategy of each test_catalog providing a suite of basic fixtures that can be passed into a parameterized suite of functions?

Something like...

@mock_aws
@pytest.fixture
def catalog_list_tables_fixture(...)
   ...

Like the glue test, and dynamo test (ect) could all provide a fixture like that? Unsure the best architecture but I think doing it correctly can vastly reduce the amount of code in the catalog test suite

jayceslesar avatar Jun 11 '25 00:06 jayceslesar

@kevinjqliu what do you think about something like this...

from unittest.mock import MagicMock

import pytest

from pyiceberg.catalog import Catalog
from tests.conftest import BUCKET_NAME

HIVE_CATALOG_NAME = "hive"
HIVE_METASTORE_FAKE_URL = "thrift://unknown:9083"


@pytest.fixture
def glue(_bucket_initialize: None, moto_endpoint_url: str) -> Catalog:
    from pyiceberg.catalog.glue import GlueCatalog

    catalog = GlueCatalog(
        name="glue",
        bucket_name=BUCKET_NAME,
        endpoint_url=moto_endpoint_url,
    )
    catalog.create_namespace("foobar")

    return catalog


@pytest.fixture
def hive() -> Catalog:
    from pyiceberg.catalog.hive import HiveCatalog

    catalog = HiveCatalog(
        name="hive",
        uri=HIVE_METASTORE_FAKE_URL,
    )
    catalog._client = MagicMock()
    catalog._client.__enter__().get_all_databases.return_value = ["foobar"]

    return catalog


@pytest.mark.parametrize("catalog", [
    pytest.lazy_fixture("glue"),
    pytest.lazy_fixture("hive")
])
def test_list_namespaces_common(catalog: Catalog) -> None:
    namespaces = catalog.list_namespaces()
    assert namespaces == [("foobar",)]

I think it becomes more of a configuration workflow than a "catalog" workflow if that makes sense?

jayceslesar avatar Jun 14 '25 16:06 jayceslesar

Got a good framework started in #2090 thanks to @.jayceslesar

Now we can add more tests, more catalog types and maybe even remove some redundant tests

kevinjqliu avatar Jul 20 '25 19:07 kevinjqliu