dione Consider datasource v2

Consider datasource v2

Open shay1bz opened this issue 3 years ago • 0 comments

Spark 2.4+ DataSource v2 is much more powerfull the in Spark 2.3. the main issue in Spark 2.3 is basically you need to implement everything yourself. and it is a headache dealing with partitioning and bucketing. this is why re reverted to use DataSource v1. in Spark 2.4 there are lot's of pinukim. we should consider to move when we upgrade. we'll have:

natural support for save as table (as opposed to now when we do everything manually)
we'll easily support dynamic partitioning. now we rely on user giving static list of partitions to index.
maybe even use bucketing instead of today's games of imitating this mechanism.

Jul 28 '21 12:07 shay1bz

dione dione copied to clipboard

Consider datasource v2

dione
dione copied to clipboard