spark-df-profiling icon indicating copy to clipboard operation
spark-df-profiling copied to clipboard

Version 1.1.13 Not Up to Date

Open claywey opened this issue 5 years ago • 8 comments

Version 1.1.13 does not match the code stored under the master branch. It is still referencing deprecated methods from dependent libraries (such as Pandas tslib)

claywey avatar Nov 18 '19 17:11 claywey

1.1.13 was published more than three years ago. Seems some useful changes have been merged into master since then, including some performance improvements (like reducing frequency of calling toPandas, and the patch I raised to acceleration correlation matrix calculation in last week), but not published.

@julioasotodv may you consider doing a new release to publish the latest up-to-date code? Happy to help if you need a hand. Cheers.

XD-DENG avatar Dec 17 '19 01:12 XD-DENG

@julioasotodv @XD-DENG I tried copy the latest master branch manually, and test running it using some data. But, it's prompting the wrong missing value in the HTML report which is all 0. You might need to check on this.

shhanani avatar Jan 08 '20 03:01 shhanani

Was anybody able to circumvent the outdated release issue?

nevinkjohn avatar Jun 10 '20 09:06 nevinkjohn

@nevinkjohn, my recommendation is to clone master, make the necessary changes, and wheel it out yourself. I did this and it worked well.

claywey avatar Jun 10 '20 16:06 claywey

Another thing you could if you use pip is pull directly from Github, putting in your requirements.txt file: git+https://github.com/julioasotodv/spark-df-profiling instead of spark-df-profiling

+1 for a new release with the more recent Pandas things and deprecation of ix though :eyes:

BrunoGomesCoelho avatar Aug 18 '20 15:08 BrunoGomesCoelho

Thanks @BrunoGomesCoelho!

victor-kironde avatar Jan 14 '21 21:01 victor-kironde

is someone working on an new release fixing this issue?

prajal55 avatar May 19 '22 01:05 prajal55

is someone working on an new release fixing this issue?

My recommendation is to fork the library and make the changes yourself. Or, go to pandas-profiling and the switch the references to pyspark.pandas

claywey avatar May 20 '22 04:05 claywey