glaredb icon indicating copy to clipboard operation
glaredb copied to clipboard

add support for Fabric OneLake storage

Open djouallah opened this issue 1 year ago • 8 comments

trying this code

import glaredb
import pandas as pd
df = pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
    }
)

con = glaredb.connect("/lakehouse/default/Files")
con.sql(f'''CREATE or replace table  xxx  AS SELECT * FROM df ''')
con.close()

I think you need the latest version of arrow-rs to make it works https://github.com/apache/arrow-rs/pull/4573

djouallah avatar Sep 23 '23 07:09 djouallah

Did some digging on this, it's likely we'll support abfs://... paths before the lakehouse file api (/lakehouse/...). There's some challenges around some unimplemented file system operations with blobfuse.


Notes for impl:

  • We'll need to update object_store to explicitly close (drop) the file before calls to std::fs::rename, otherwise the metadata is not flushed in time for the rename. I believe this is actually a bug in blobfuse since the metadata should be flushed on file create, but isn't.
  • Blobfuse doesn't support hard linking, so copy_if_not_exists just fails. Not sure what to do here yet.

scsmithr avatar Sep 24 '23 20:09 scsmithr

As a vote or confidence, a onelake destination in glaredb would make me choose this over Fabric any day. Power BI is great, the concept of onelake to empower power BI is great. Fabric not so much.

jordandakota avatar Sep 26 '23 22:09 jordandakota

That's fine, you don't need to like other Fabric Engines, OneLake is neutral and works with any Engine as long as it understand Delta table.

djouallah avatar Sep 29 '23 07:09 djouallah

Exactly. Am currently working with databricks and having unity catalog in OneLake. Only remaining issue is how Unity writes a table name vs how OneLake prefers to see it.

jordandakota avatar Sep 29 '23 16:09 jordandakota

any update on this, I presume it should be easy now as it is supported by delta_rs

djouallah avatar Nov 15 '23 05:11 djouallah

any update on this, I presume it should be easy now as it is supported by delta_rs

We've made some changes to how we plumb stuff through to delta-rs, but I have not tested if this all works yet with Fabric (either via abfs://... or through the filesystem api). We'll be checking on this over the next couple of days, and I'll follow up with an update.

scsmithr avatar Nov 16 '23 23:11 scsmithr

Sounds great. Looking forward to it.

jordandakota avatar Nov 17 '23 00:11 jordandakota

any update, I see that you are using now the latest version of Arrow rs, basically we need something like this

write_deltalake("abfss://[email protected]/Delta_Table.Lakehouse/Tables/fruit",
df,storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

djouallah avatar Jan 31 '24 11:01 djouallah