mack icon indicating copy to clipboard operation
mack copied to clipboard

[Question] Ability of partition pruning

Open DashaBulanova opened this issue 3 years ago • 3 comments

Hi,

Thank you for the good library.

What about the partition pruning ability in the function type_2_scd_generic_upsert? For big data sets, it can be very actual. What do you think about it?

DashaBulanova avatar Dec 14 '22 09:12 DashaBulanova

So I typically think of partition pruning when filtering... can you help me understand this feature request? Can you perhaps send a code snippet? Thank you!

MrPowers avatar Dec 19 '22 22:12 MrPowers

Hi Matthew,

Yes, you are absolutely right. It's one of the query optimization techniques.

I meant, for example, if we know additional business information about our dataset we can read not all data [for years] but read only those partitions which actually for specific use cases.

More specifically, for example, we know that payments can change during one quartal. so we can allow users to provide that information to reduce the number of partitions when we read DeltaTable.

In terms of code, we can add a parameter for the filter and use it when we read DeltaTable.

PS: I can create PR for this proposal if you think it will be useful.

DashaBulanova avatar Jan 03 '23 12:01 DashaBulanova

@DashaBulanova Hi Dasha,

thank you very much explaining! Yes, it would be great if you could create a PR including tests that show the expected behaviour :)

robertkossendey avatar Jan 04 '23 12:01 robertkossendey