Paola Pardo

Results 73 comments of Paola Pardo

Great test @fpj !!! Thanks for giving it a try 💯 I would suggest to also output some performance numbers for the sampling operation. Something like: `SELECT count(*) FROM table...

Yes @fpj , the behaviour is expected since the filters of sampling should be applied in the same way as before. Now, if this PR has shown improvement in WHERE...

For the record: this update would be for `main-1.0.0`

Some things I am experiencing with the versions upgrade: - `withNewTransaction` method from Delta is deprecated. Now they enforce to pass a series of new arguments as an `Option` to...

This feature is merged in `1.0.0-main`!

Merged on https://github.com/Qbeast-io/qbeast-spark/pull/284

FYI: we don't use the full PCA. We just partly analyze which columns contain higher variance, without having to create new mapping columns in the dataframe.

I will add the corresponding skeleton to call the PCA method, and tomorrow we can work on integrating @SrTangente code.

> WIP, we use a correlation matrix to order the columns from least to max avarega correlation and then filter the top n columns Description updated, thanks!

This feature is merged in `1.0.0-main`. Waiting for addition into main when the release is made.