delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Add DataCatalog support

Open rtyler opened this issue 2 years ago • 6 comments

Environment

Delta-rs version: 0.13.0

Binding: Python (deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)

Environment:

  • OS: Linux/amd64
  • Other:

Bug

When trying to use either flavor of DataCatalog a ValueError is thrown.

What happened:

❯ python3
Python 3.11.4 (main, Jun 28 2023, 19:51:46) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from deltalake import DeltaTable, DataCatalog
>>> dt = DeltaTable.from_data_catalog(DataCatalog.AWS, 'db', 'table')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tyler/source/github/noviconnect/venv/lib64/python3.11/site-packages/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Catalog 'glue' not available.
>>> dt = DeltaTable.from_data_catalog(DataCatalog.UNITY, 'db', 'table')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tyler/source/github/noviconnect/venv/lib64/python3.11/site-packages/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Catalog 'unity' not available.
>>>

What you expected to happen:

C'mon son.

How to reproduce it:

More details:

rtyler avatar Nov 14 '23 15:11 rtyler

take

r3stl355 avatar Nov 18 '23 22:11 r3stl355

I was unable to reproduce the glue error but I am on Mac. I'll keep on poking around but if you build for native-tls then this could be a reason (and few other places where glue feature is used alone): https://github.com/delta-io/delta-rs/blob/dd6b45362a14c0f127b32c4b81afc15d17f710d5/crates/deltalake-core/src/data_catalog/mod.rs#L141

As for the unity error, I suspect it could be a misleading error due to this, it should just return the original error as it has the right info, I'll change it: https://github.com/delta-io/delta-rs/blob/dd6b45362a14c0f127b32c4b81afc15d17f710d5/python/src/lib.rs#L136

r3stl355 avatar Nov 19 '23 13:11 r3stl355

@r3stl355 I have a feeling that this error might still exist in main albeit with better error messages. I think the problem is the Linux wheels don't have the glue feature enabled

rtyler avatar Nov 22 '23 23:11 rtyler

I'll have a look, need to build myself a linux box, are you building with any specific settings or just using the standard build @rtyler ?

r3stl355 avatar Nov 22 '23 23:11 r3stl355

Hey, I need to understand the problem better here. I tried this in a docker container and an Ubuntu 22.04 VM on AWS using both a build from source and a released version(deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) and I get something like that, which gives a meaningful error.

Traceback (most recent call last):
  File "/home/ubuntu/delta-rs/python/issue_1860.py", line 4, in <module>
    dt = DeltaTable.from_data_catalog(DataCatalog.AWS, 'db', 'table')
  File "/home/ubuntu/delta-rs/python/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
OSError: Catalog glue error: Entity Not Found

@rtyler - what do I miss? I think that Entity not found error I am getting is coming from Glue, no?

Just confirmed, this is a Glue error from rusoto: https://github.com/delta-io/delta-rs/blob/fa6c5139033a06274dc829e0cf4053f72b0a9887/crates/deltalake-core/src/data_catalog/mod.rs#L62

r3stl355 avatar Nov 24 '23 20:11 r3stl355

reopening it since we likely want to re-add that once catalogs are working again.

roeap avatar Dec 11 '23 18:12 roeap