delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Adopt influxdata/object_store_rs?

Open wjones127 opened this issue 2 years ago • 5 comments

Description

This crate maintains its own notion of StorageBackend, but it might be more than we need to maintain for Delta Lake support.

influxdata/object_store_rs implements an ObjectStore trait that covers the basics we need and provides a point of collaboration with the rest of the Rust data ecosystem. It looks likely that DataFusion will also adopt this crate eventually instead of it's internal ObjectStore trait, which is less complete (see apache/arrow-datafusion#2489).

If ObjectStore exposed the underlying client, it seems like we could simply make StorageBackend depend on that trait and only have to maintain the rename_obj_noreplace() method:

#[async_trait::async_trait]
pub trait StorageBackend: Send + Sync + Debug + ObjectStore {
    async fn rename_obj_noreplace(&self, src: &str, dst: &str) -> Result<(), StorageError>;
}

What do we think of that?

wjones127 avatar May 20 '22 02:05 wjones127

I think this is a great idea. As you said in a first step it would reduce complexity in this crate. When adopted by datafusion. it would also greatly simplify the datafusion interop in the TableProvider and the operations.

One thing to mention is that right now it does not support ADLS Gen2 which we need for rename_obj_noreplace unless we also want to build a lock client for Azure. But I'd be happy to port the Gen2 support over there.

roeap avatar May 20 '22 04:05 roeap

+1 for me to push the rust data eng community towards a single feature rich and highly optimized object store crate!

houqp avatar May 30 '22 02:05 houqp

We still need to add the dynamodb lock integration to object_store_rs before it's ready for us to start integrating. I suspect we will have to write some adapters to the object stores to instantiate them with the various configuration options we have. Anything else we need to do?

One important difference between our implementation and object_store_rs is that there is that it implements rename_if_not_exists with copy_if_not_exists and delete for all stores. The only ones we implement as a single call are local file system and Azure. They seem to be reluctant to add the local filesystem calls because they are so complex and OS-dependent; but if a strong case could be made they make a material difference on performance then we can likely add them.

In the Azure case, I think if we keep as two calls we can drop the requirement for ADLS2 and just support Blob store generally, right? Would we see that as an improvement? cc @roeap @thovoll

wjones127 avatar Jul 11 '22 04:07 wjones127

but if a strong case could be made they make a material difference on performance then we can likely add them.

I think we only ever use rename_if_not_exists when committing transactions. As such I think it's a relatively rare (while critical) occurrence and doing two calls will add negligible overhead to the transaction as a whole.

Would we see that as an improvement?

With the broader support, we also get the ability to use Azurite for integration tests - which is a big win. The biggest loss might be the directory semantics for list operations. Then again, we just made that call recursive for the vacuum operations. When listing the delta log there should be not much difference, since we filter by prefix and its flat. When listing files it would be nice to exclude all log, index files etc. but we need to traverse into partition folders anyhow.

Another open point in object_store_rs is the authorization options. Right now I think azure and s3 only support authorization using a master key, while gcp uses an oauth flow based on client certificate. From what I can see we could try and make the token provider trait based over there and keep using the client libraries over here, or implement more oauth flows. This would leave managed identities, which AFAIK in azure do not count as an official oauth flow. Given what people have been asking for - and also what's needed on my end - support for master key, client secrets, client certificates, and managed identities should be supported.

More generally speaking I have been hoping to maybe provide more dedicated stores (with a shared object_store) - one for the log and one for interacting with the data files. But that's probably a larger discussion. However we should probably think about how we want to leverage the Path object from object_store_rs. My current favourite would be to always deal in relative paths from the table root. The local store has the concept of a root, while the cloud stores are always rooted on the bucket/container level. If we want to go that way I think we will we need at least a thin wrapper around the base store.

roeap avatar Jul 11 '22 06:07 roeap

One important difference between our implementation and object_store_rs is that there is that it implements rename_if_not_exists with copy_if_not_exists and delete for all stores. The only ones we implement as a single call are local file system and Azure.

I think HDFS will also be able to leverage this single atomic operation optimization too. With the wrapper trait design you proposed in the description, we should still be able to keep our current atomic rename implementation while migrating the rest to object_store_rs, no?

For local file, the performance difference probably won't matter too much because hardlink only incurs metadata copy, so it's cheap. The only minor downside is left over files on crash. But I am not so sure about Azure and HDFS. If the copy call in those object stores requires copying of actual object data, then it will increase the commit latency by a lot.

Considering the code to do perform atomic rename for both local file and azure is so small (<100 LoC), I would suggest we keep it for a better user experience. Getting rid of them doesn't bring any value to our users, it's a net negative for them. It will delete 150 lines of code from our point of view, but I don't think those specialized code actually increase the maintenance burden for us?

In the Azure case, I think if we keep as two calls we can drop the requirement for ADLS2 and just support Blob store generally, right? Would we see that as an improvement? cc @roeap @thovoll

In an ideal world, I think it's best if we don't require ADLS2 for azure, but make it an optional feature that users can enable if they want to have lower commit latency. Similar to how we make the dynamodb lock an optional feature for S3 store. So yeah, if we can drop the requirement for ADLS2, I would agree it's an improvement :)

houqp avatar Jul 18 '22 03:07 houqp

I'm thinking now a good transition path is to:

  • [ ] Rewrite the delta-rs object stores to just implement the ObjectStore trait from object-store-rs, completely removing our own StorageBackend stuff.
  • [ ] Once our own object stores become redundant due to upstream work in object-store-rs, deprecate them in delta-rs.

wjones127 avatar Aug 19 '22 01:08 wjones127

Agreed! Also very much looking forward to all the dependencies we a re going to loose :) - the AWS migration already landed, and Azure is on its way (apache/arrow-rs#2509).

Do we already know what the target design for the object sore looks like in general? To the best of my recollection we are somewhere at:

  • Keep a wrapper around the specific stores to:
    • put the object store root at the table root to handle paths as they appear in the delta log.
    • handle missing locks until (if?) that is moved upstream

With the second point I am not sure what would be the best approach. Have it handled in a wrapper store, or just wrap the s3 store to handle that? Writing this I think the former might be better?

As for when are the object_store_rs stores ready for use here? Main point I see is authorization. From what I can tell, S3 supports a sufficient amount of options already, Azure supports access key and service principal auth, which is en-par with what we have here. Soon I hope to add managed identity support as well. ~~GCP I cannot really judge. It seems to support auth based on a service account file, but what options there are and how they are typically used I have no idea ...~~ Just read through the GCP backends code, and I think supported authorization for GCP is the same here and in object_store_rs.

roeap avatar Aug 20 '22 05:08 roeap