Paola Pardo
Paola Pardo
Merged on https://github.com/Qbeast-io/qbeast-spark/pull/284
I have a question regarding this task: If we filter all the files with Delta, does still make sense to filter again with Qbeast to filter by min/max? For the...
And can you @alexeiakimov take care of this task? Thank you!
After discussion, agreed on: - When applying WHERE file filtering, let min/max Delta Skipping filter the set of files. - When applying SAMPLING, join both sets of files. This allows...
Wuuuu, we really need to work on this Revision flow.......... opening an issue for redefining the steps.
Issue related to this #223
I cannot reproduce the error that you are experiencing. I've tried: - Version `1.0.0-SNAPSHOT `working with Spark 3.4.1 and Delta 2.4.0. Fine - Version `1.0.0-6a780ea1-SNAPSHOT` working with Spark 3.5.0 and...
1. The number of columns used to compute the stats **can be set with a table property from Delta**: `delta.dataSkippingNumIndexedCols`. Since it's a table property, you should create the table...
My initial thoughts on this: 1. IdentityTransformation should NOT be superseded by another IdentityTransformation. (By definition, the space value of Identity A is not considered in Identity B unless value...
In DatasourceV2 there's also the possibility to **build your own scan of the table**, with more options than the Datasource V1 (which we are currently using). Maybe it's worth to...