Paola Pardo
Paola Pardo
Great test @fpj !!! Thanks for giving it a try 💯 I would suggest to also output some performance numbers for the sampling operation. Something like: `SELECT count(*) FROM table...
Yes @fpj , the behaviour is expected since the filters of sampling should be applied in the same way as before. Now, if this PR has shown improvement in WHERE...
For the record: this update would be for `main-1.0.0`
Some things I am experiencing with the versions upgrade: - `withNewTransaction` method from Delta is deprecated. Now they enforce to pass a series of new arguments as an `Option` to...
This feature is merged in `1.0.0-main`!
Merged on https://github.com/Qbeast-io/qbeast-spark/pull/284
FYI: we don't use the full PCA. We just partly analyze which columns contain higher variance, without having to create new mapping columns in the dataframe.
I will add the corresponding skeleton to call the PCA method, and tomorrow we can work on integrating @SrTangente code.
> WIP, we use a correlation matrix to order the columns from least to max avarega correlation and then filter the top n columns Description updated, thanks!
This feature is merged in `1.0.0-main`. Waiting for addition into main when the release is made.