mack [Question] Ability of partition pruning

Hi,

Thank you for the good library.

What about the partition pruning ability in the function type_2_scd_generic_upsert? For big data sets, it can be very actual. What do you think about it?

Dec 14 '22 09:12 DashaBulanova

So I typically think of partition pruning when filtering... can you help me understand this feature request? Can you perhaps send a code snippet? Thank you!

Dec 19 '22 22:12 MrPowers

Hi Matthew,

Yes, you are absolutely right. It's one of the query optimization techniques.

I meant, for example, if we know additional business information about our dataset we can read not all data [for years] but read only those partitions which actually for specific use cases.

More specifically, for example, we know that payments can change during one quartal. so we can allow users to provide that information to reduce the number of partitions when we read DeltaTable.

In terms of code, we can add a parameter for the filter and use it when we read DeltaTable.

PS: I can create PR for this proposal if you think it will be useful.

Jan 03 '23 12:01 DashaBulanova

@DashaBulanova Hi Dasha,

thank you very much explaining! Yes, it would be great if you could create a PR including tests that show the expected behaviour :)

Jan 04 '23 12:01 robertkossendey