Simon Brugman
Simon Brugman
Will look into this
Thanks @jodom961. The comments above provide some pointers to improve the PR.
Planned for end of this year. Any contributions are welcome.
@skorski There is currently no branch for this. There used to be a version of PP that was executing a Spark backend. That implementation used the `pyspark.sql` module to generate...
The working document for the implementation plan can be found here: https://github.com/pandas-profiling/pandas-profiling/wiki/Spark-Development-Plan. Contributions are welcome. (Thanks to @chanedwin)
@skorski The Slack community for pandas-profiling can be used for that: https://join.slack.com/t/pandas-profiling/shared_invite/zt-hfy3iwp2-qEJSItye5QBZf8YGFMaMnQ @ahmedanis03 Thank you for the suggestion, we're also considering koalas. The bulk of the work seems to be...
Hey @ncoish, yes it's ongoing ! @chanedwin has done the lion's share of the work needed for the Spark backend, which now needs to be integrated
https://github.com/pandas-profiling/pandas-profiling/pull/670
The progress can be tracked on the github project. An alpha version is planned December 2021. In case anyone is interested in contributing, please reach out to @chanedwin (preferably via...
Good catch. Might have to do with the setting of the backend in 3.1.0. Two ways of checking this: check if the same happens with 3.0.0 or set the backend...