qbeast-spark
qbeast-spark copied to clipboard
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
PR draft for pre-commit hooks. Issue #321
## What went wrong? Qbeast is not able to overwrite an existing delta table. ## How to reproduce? ``` // Create a delta table df.write.format("delta").save(tablePath) // Overwrite it with qbeast...
The input `QbeastOptions.apply` gets from spark is a `CaseInsensitiveMap`. Should `QbeastOptions.toMap` return an instance of `CaseInsensitiveMap` as well?
Table Formats encapsulate write actions into an Optimistic Transaction. Various processes could try to commit the info to the Transaction Log, but only one would succeed, making the others retry...
Classes and methods, and their corresponding tests, are rendered redundant by algorithm changes such as `domain-driven double pass` and the latest changes introduced by `multi-block files`. For instance, `NormalizedWeight` should...
Investigating in the Spark UI with simple queries, we detected that the Metadata time for Qbeast datasource is bigger than expected. Here's a comparison of a small (10 element) dataset...
From v0.6.0 onwards, the structure of the Table is composed by files that contain multiple `blocks`, each of them belonging to the same or different cubes. This is part of...
### WARNING: _Replication would be removed from 0.6.0 version_ ## Multiblock Format The upcoming release of Qbeast Spark has[ new protocol updates](https://github.com/Qbeast-io/qbeast-spark/blob/main-1.0.0/docs/QbeastFormat1.0.0.md#block-metadata-before-the-version-100). In this modification, we **change the layout of...
## What went wrong? When enabling auto indexing, we call `SparkColumnsToIndexSelector` to choose which are the best columns to group the data. This selection is based on statistics and correlations...
## What went wrong? If we try to save and empty DataFrame with qbeast format, we throw the following error: ```scala java.lang.RuntimeException: The DataFrame is empty, why are you trying...