iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

feat: implement InMemoryCatalog as a subclass of SqlCatalog

Open hussein-awala opened this issue 1 year ago • 2 comments

closes: #1110

This PR implement a new catalog InMemoryCatalog as a subclass of SqlCatalog with SQLite in-memory.

hussein-awala avatar Sep 05 '24 22:09 hussein-awala

@kevinjqliu I applied what you suggested in the comment above, could you recheck it now?

hussein-awala avatar Sep 06 '24 19:09 hussein-awala

@Fokko wydt of this change? i remember we had past discussions on adding a "new" catalog implementation

kevinjqliu avatar Sep 07 '24 19:09 kevinjqliu

@hussein-awala Thanks for working on this 🚀 @kevinjqliu Regarding the new catalogs, my main concern was a proliferation of new catalogs, and that they would lack maintenance. I do like this change for two reasons:

  • It moves out of the InMemoryCatalog that's specific to tests. We want to have the catalog as part of the tests, otherwise we're testing a catalog that's not part of the normal code-path.
  • It merges the InMemory catalog into the SqlCatalog. This way, when new features are released, such as support for views, multi-table transactions, etc. we have fewer places where we need to implement them.

I'm positive about this change. The only consideration I could make is that we hide the SqlCatalog behind the InMemoryCatalog. Maybe it is interesting for folks to know that they can easily switch to a persistent catalog. What are your thoughts?

Fokko avatar Oct 28 '24 14:10 Fokko

I'm positive about this change. The only consideration I could make is that we hide the SqlCatalog behind the InMemoryCatalog. Maybe it is interesting for folks to know that they can easily switch to a persistent catalog. What are your thoughts?

I think it would be good to document the InMemoryCatalog, perhaps in the catalog section of the configuration page. We can mention that it uses the SqlCatalog under the hood and to use another catalog implementation to persist the catalog metadata

kevinjqliu avatar Oct 28 '24 17:10 kevinjqliu

hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready!

kevinjqliu avatar Feb 01 '25 00:02 kevinjqliu

hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready!

yes, I will make it ready ASAP

hussein-awala avatar Feb 07 '25 20:02 hussein-awala

Or just:

catalog = load_catalog('default', 'type'='in-memory', 'warehouse'='/tmp/pyiceberg/warehouse')

I agree that this catalog impl is mostly focussed on testing/demonstration. If you would use a Jupyter notebook, each time you restart the kernel, then you end up with a fresh catalog (don't have to clean up any old stuff lingering around).

Fokko avatar Feb 10 '25 10:02 Fokko

Thanks for the contribution @hussein-awala and thanks for the review @Fokko

kevinjqliu avatar Feb 10 '25 17:02 kevinjqliu