spark-df-profiling
spark-df-profiling copied to clipboard
Version 1.1.13 Not Up to Date
Version 1.1.13 does not match the code stored under the master branch. It is still referencing deprecated methods from dependent libraries (such as Pandas tslib)
1.1.13 was published more than three years ago. Seems some useful changes have been merged into master since then, including some performance improvements (like reducing frequency of calling toPandas
, and the patch I raised to acceleration correlation matrix calculation in last week), but not published.
@julioasotodv may you consider doing a new release to publish the latest up-to-date code? Happy to help if you need a hand. Cheers.
@julioasotodv @XD-DENG I tried copy the latest master branch manually, and test running it using some data. But, it's prompting the wrong missing value in the HTML report which is all 0. You might need to check on this.
Was anybody able to circumvent the outdated release issue?
@nevinkjohn, my recommendation is to clone master, make the necessary changes, and wheel it out yourself. I did this and it worked well.
Another thing you could if you use pip is pull directly from Github, putting in your requirements.txt file: git+https://github.com/julioasotodv/spark-df-profiling
instead of spark-df-profiling
+1 for a new release with the more recent Pandas things and deprecation of ix though :eyes:
Thanks @BrunoGomesCoelho!
is someone working on an new release fixing this issue?
is someone working on an new release fixing this issue?
My recommendation is to fork the library and make the changes yourself. Or, go to pandas-profiling and the switch the references to pyspark.pandas