Consolidate catalog behavior
Feature Request / Improvement
We should ensure consistency in catalog behaviors for all implementations.
Currently, we have a few different catalog implementations (dynamodb, glue, hive, rest, and sql). Each catalog implementation supports a set of functions described in the Catalog ABC class (https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/init.py#L276)
The behavior for each catalog implementation should be standardized. And we should test that a set of behaviors is consistent across all catalog implementations.
Here are a few related issues
- #742
- #769
- #507
- #849
Add SqlCatalog to test console #488 is an example PR that utilizes tests/cli/test_console.py to ensure the same behaviors for different catalogs.
Hii Kevin, Can I take up this issue?
@rohithkanchukatla sure thing
@kevinjqliu this can be closed as well as all tasks in this issue are completed.
The behavior for each catalog implementation should be standardized. And we should test that a set of behaviors is consistent across all catalog implementations.
@soumya-ghosh I want to keep this issue open until we have a solution to address future updates. Ideally, there's a set of tests that will check the behavior of all the catalog implementations.
Add SqlCatalog to test console https://github.com/apache/iceberg-python/pull/488 is an example PR that utilizes tests/cli/test_console.py to ensure the same behaviors for different catalogs.
A possible solution is to test the input and expected output via the CLI
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
We can delete mountains of code if this is done correctly....
What about a strategy of each test_catalog providing a suite of basic fixtures that can be passed into a parameterized suite of functions?
Something like...
@mock_aws
@pytest.fixture
def catalog_list_tables_fixture(...)
...
Like the glue test, and dynamo test (ect) could all provide a fixture like that? Unsure the best architecture but I think doing it correctly can vastly reduce the amount of code in the catalog test suite
@kevinjqliu what do you think about something like this...
from unittest.mock import MagicMock
import pytest
from pyiceberg.catalog import Catalog
from tests.conftest import BUCKET_NAME
HIVE_CATALOG_NAME = "hive"
HIVE_METASTORE_FAKE_URL = "thrift://unknown:9083"
@pytest.fixture
def glue(_bucket_initialize: None, moto_endpoint_url: str) -> Catalog:
from pyiceberg.catalog.glue import GlueCatalog
catalog = GlueCatalog(
name="glue",
bucket_name=BUCKET_NAME,
endpoint_url=moto_endpoint_url,
)
catalog.create_namespace("foobar")
return catalog
@pytest.fixture
def hive() -> Catalog:
from pyiceberg.catalog.hive import HiveCatalog
catalog = HiveCatalog(
name="hive",
uri=HIVE_METASTORE_FAKE_URL,
)
catalog._client = MagicMock()
catalog._client.__enter__().get_all_databases.return_value = ["foobar"]
return catalog
@pytest.mark.parametrize("catalog", [
pytest.lazy_fixture("glue"),
pytest.lazy_fixture("hive")
])
def test_list_namespaces_common(catalog: Catalog) -> None:
namespaces = catalog.list_namespaces()
assert namespaces == [("foobar",)]
I think it becomes more of a configuration workflow than a "catalog" workflow if that makes sense?
Got a good framework started in #2090 thanks to @.jayceslesar
Now we can add more tests, more catalog types and maybe even remove some redundant tests