qbeast-spark Add Convert To Qbeast

Add Convert To Qbeast

Open osopardo1 opened this issue 3 years ago • 0 comments

The only way of writing in Qbeast Format is to load your data and write it again with Spark Dataframes API.

It could be good to have some more easy ways to convert data in other formats to Qbeast, and that can be compatible with reading when no Metadata is found.

For that, we can think of two approaches:

Write the data in the same place but organized with the Qbeast index. If more data is added while the conversion is taking place, we are targeting this data as non-indexed and reading all of them in case we need it.
Write the data in the same place and mark it as replicated cubes. So we will only duplicate the data we need for optimizing.

Doubts/things we need to figure out:

How to specify the columns to index in the API
How to handle partitioning? Should be useful to index the columns that are in partition values?
Study the feasibility of the second approach
Study the integration with the Keeper
Other design problems that could arise

May 09 '22 10:05 osopardo1

qbeast-spark qbeast-spark copied to clipboard

Add Convert To Qbeast

qbeast-spark
qbeast-spark copied to clipboard