Support iceberg hadoop catalog in python library
Feature Request / Improvement
Migrated ticket https://github.com/apache/iceberg/issues/3220
Check the original ticket for details.
We use hadoop catalog on a fs with atomic move support. Would you accept a contributed hadoop catalog to pyiceberg?
@Fokko forgot the ping. Thanks
This would really help us out, where we use Hadoop catalog for unit testing PySpark code, and are increasingly encountering cases where we want to test code that uses both pyiceberg and pyspark and expects them to share the same catalog.
Hey @brianfromoregon We try to avoid implementing the Hadoop catalog in PyIceberg. It is a different implementation than the other catalogs since conflict detection relies on the atomic renames of HDFS.
@corleyma Have you tried running a simple REST catalog similar to what we do in the PyIceberg test setup?
@Fokko We do a setup similar to this for integration tests, but the ability to write faster unit tests that depend only on a temp directory fixture in pytest has been great for our PySpark code.
We had separately been using an InMemoryCatalog for unit tests of certain pyiceberg code, but now that we have more functions comingling pyspark and pyiceberg (ddl/metadata manipulation in pyiceberg), we are running into the limits of pyiceberg not supporting Hadoop catalog.
I would love if we could add a file system catalog to PyIceberg compatible with PySpark and HadoopCatalog. It could be named and documented whichever way is needed to ensure folks know it's not a production catalog, but I think it's legitimately useful for testing purposes.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'