dione
dione copied to clipboard
Consider datasource v2
Spark 2.4+ DataSource v2 is much more powerfull the in Spark 2.3. the main issue in Spark 2.3 is basically you need to implement everything yourself. and it is a headache dealing with partitioning and bucketing. this is why re reverted to use DataSource v1. in Spark 2.4 there are lot's of pinukim. we should consider to move when we upgrade. we'll have:
- natural support for save as table (as opposed to now when we do everything manually)
- we'll easily support dynamic partitioning. now we rely on user giving static list of partitions to index.
- maybe even use bucketing instead of today's games of imitating this mechanism.