Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Python interface for daft.PartitionField and PartitionTransform

Open MingshiPeng opened this issue 11 months ago • 3 comments

Is your feature request related to a problem?

Issue

Daft PartitionField and PartitionTransform (code) doesn't expose a Python interface to access its attribute. E.g. given a PartitionTransform object, there is no way to figure out what type of Transform it is. Similarly PartitionField object only exposes a PyField object which isn't enough to represent a Partition Field.

This issue blocks model conversion between Daft and DeltaCAT.

However this is not a high priority issue as it only blocks Daft -> DeltaCAT model conversion, while in Daft-DeltaCAT integration, only DeltaCAT -> Daft model conversion is needed in order for Daft to be able to read a DeltaCAT table. Related DeltaCAT PR

Describe the solution you'd like

I'd like a more complete Python interface for accessing attributes of PartitionTransform and PartitionField objects.

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

No

MingshiPeng avatar Apr 21 '25 22:04 MingshiPeng

hi @MingshiPeng, just for some clarity, is the ask that for PartitionField, you want to access properties like those available on Field (name, dtype, ...).

something like:

pf: PartitionField
pf.name # "my_field"
pf.dtype # DataType.int64()

and similarly, for PartitionTransform do you need to figure out what variant it is? We recently added similar functionality to DataType. Would this suffice?

example:


pt: PartitionTransform

pt.is_identity()
pt.is_iceberg_bucket()
pt.is_iceberg_truncate()
pt.is_year()
pt.is_month()
pt.is_day()
pt.is_hour()
pt.is_void()

if pt.is_iceberg_bucket():
  n_buckets = pt.num_buckets

if pt.is_iceberg_truncate():
  width = pt.width

universalmind303 avatar Apr 22 '25 16:04 universalmind303

https://github.com/ray-project/deltacat/pull/527#discussion_r2053071969

rchowell avatar Apr 22 '25 17:04 rchowell

Hi @universalmind303

For PartitionField - the name and dtype attributes of Field is already accessible here so I don't have request. My requests are for PartitionField to expose accessible interface for source_field and transform (see the PartitionField.__init__() pasted below where those 2 attributed are entered)

https://github.com/Eventual-Inc/Daft/blob/f1f425220b389e7116ddbf70b10268d85fc32ad4/daft/daft/init.pyi#L746-L751

For PartitionTransform, the example you shared is the exact feature I need, I wasn't aware that it existed, so no further request on the PartitionTransfrom side, thanks.

MingshiPeng avatar Apr 23 '25 19:04 MingshiPeng