delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Optimize enhancements

Open wjones127 opened this issue 1 year ago • 4 comments

Description

This is an umbrella issue for a variety of improvements we could make to optimize:

  • [x] #1460
  • [x] #1442
  • [x] #1465
  • [ ] Add support for Sort-based optimize
  • [ ] Use Parquet sort metadata to make sort-based optimize skip already sorted files

Use Case

Related Issue(s)

wjones127 avatar Jun 06 '23 03:06 wjones127

Hello @wjones127, just want to check if the compression option is already exposed in python write_deltalake? I'm not sure I have missed it somewhere or it's not yet ready. We're really hoping we could do a snappy on our outputs but it might not be available yet. In any case, could you recommend a workaround for this?

Thank you so much!

mjducut-oe avatar Sep 07 '23 13:09 mjducut-oe

the compression option is already exposed in python write_deltalake?

Nope, no one has made a PR for that yet. I don't think there's any workaround at the moment. Shouldn't be too hard to implement though if someone is motivated to contribute it.

wjones127 avatar Sep 07 '23 21:09 wjones127

the compression option is already exposed in python write_deltalake?

Nope, no one has made a PR for that yet. I don't think there's any workaround at the moment. Shouldn't be too hard to implement though if someone is motivated to contribute it.

I can maybe centralize this in a function after update, merge PRs. I also partially exposed the writer properties in those. Would be good to make this available across all the APIs

ion-elgreco avatar Oct 08 '23 19:10 ion-elgreco

@ion-elgreco, sounds great! This will really help us especially with storage utilization on writes. Will keep watch and thank you! 🙏👍

mjducut-oe avatar Oct 09 '23 04:10 mjducut-oe