Add support to translate `object_store` storage options to `daft.io.IOConfig`
Several other libraries pass around a storage options dictionary that is then used by the object_store Rust crate to authenticate and do reads and writes. To allow users to more easily move to Daft, we could provide a functionality for them to use their storage options in Daft.
There are two ways to do this:
- Create a function like
storage_options_to_io_config(options: dict[str, str]) -> IOConfigwhich does this conversion. One thing to figure out about this is that we would need to know what cloud provider they are using, since storage option values between cloud providers are not disjoint. - Allow users to pass in a
storage_optionswherever they can passio_config. In this case we can usually infer the cloud provider so it would probably be a cleaner API, but that would make it harder for users to take advantage of authentication flows that we have butobject_storedoesn't.
Another thing to consider is if we wanted to use the mappings in the object_store crate, which would require dipping into the Rust layer, or to copy the mappings into our own code
I want to pass this option, but i don't know how to do it
storage_options={"allow_unsafe_rename":"true"}
@djouallah Looks like allow_unsafe_rename is an option that is used by delta-rs rather than object store.
A workaround should be to set
export AWS_S3_ALLOW_UNSAFE_RENAME=true
source: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
@djouallah Looks like
allow_unsafe_renameis an option that is used by delta-rs rather than object store. A workaround should be to setexport AWS_S3_ALLOW_UNSAFE_RENAME=truesource: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
yes, but how to do it in daft, that was my question ?
@djouallah Looks like
allow_unsafe_renameis an option that is used by delta-rs rather than object store. A workaround should be to setexport AWS_S3_ALLOW_UNSAFE_RENAME=truesource: https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/
yes, but how to do it in daft, that was my question ?
This isn't a Daft-specific configuration! It's actually from delta-rs, and isn't actually an object_store configuration either. You can just set the environment variable like so in your program, which will correctly configure delta-rs.
export AWS_S3_ALLOW_UNSAFE_RENAME=true
no luck in a notebook :(
OSError: Generic LocalFileSystem error: Unable to copy file from /synfs/lakehouse/default/Tables/T10/daft/_delta_log/_commit_c475e751-6256-4777-8fa7-fc8f1704d785.json.tmp to /synfs/lakehouse/default/Tables/T10/daft/_delta_log/00000000000000000000.json: Function not implemented (os error 38)
@jaychia @kevinzwang Let's expose an option to allow allow_unsafe_rename. I dug through the delta-rs code and it looks like they overload allow_unsafe_rename to do both AWS_S3_ALLOW_UNSAFE_RENAME for S3 and an allow path for other filesystems.
https://github.com/delta-io/delta-rs/blob/f05b2bf31530def92cdf7c5f22812e3ed6fe4eec/crates/aws/src/storage.rs#L419C17-L419C36
@jaychia I think this the codepath that is getting hit when allow_unsafe_rename is set and the object store is mounted locally.
https://github.com/delta-io/delta-rs/blob/f05b2bf31530def92cdf7c5f22812e3ed6fe4eec/crates/mount/src/lib.rs#L46
LOL it seems like they reused the key allow_unsafe_rename for both s3 and mount filesystem
https://github.com/delta-io/delta-rs/blob/f05b2bf31530def92cdf7c5f22812e3ed6fe4eec/crates/mount/src/config.rs#L29
Yeah we can definitely add this. First @djouallah could you try setting export MOUNT_ALLOW_UNSAFE_RENAME=true fixes the error you saw?
it is working and it is freaking fast !!! interesting,
question, how do I do partition by , and is there a way to control the file size, it seems daft generate really small file 15 mb
edit : it works fine in delta_rs 0.17.4 but not 0.18.2
@djouallah we do not yet have the ability to do partitioned writes, but we are working on it! As for file sizes, maybe we can expose a config parameter for that, I'll take a look.
edit : it works fine in delta_rs 0.17.4 but not 0.18.2
Do you see a specific error with 0.18.2, or does it just have the same behavior as when MOUNT_ALLOW_UNSAFE_RENAME is not set?