evidently icon indicating copy to clipboard operation
evidently copied to clipboard

Spark dataframes integration?

Open echarso opened this issue 3 years ago • 6 comments

Thank you for this nice project. I was wondering if there is going to be any integration with spark data frames or big data with your work. Really sorry if that integration exists and I couldn't find it.

echarso avatar Dec 08 '21 09:12 echarso

Hi @echarso, Thank you for this question.

Currently, the tool works only with pandas DataFrames or CSV files (CLI version). This means that you can either transform Spark DataFrame to Pandas DataFrame and then run evidently. In this scenario, having a smaller data sample will make sense.

We are adapting tool to larger amounts of data; this will be addressed in the next releases.

emeli-dral avatar Dec 28 '21 10:12 emeli-dral

Aloha @emeli-dral ,

I was wondering if there has been any updates regarding any plans or further discussion regarding integration with Spark DataFrames? Extremely excited to see the library continue to grow :D

Cheers

lowballedintern avatar Aug 01 '22 05:08 lowballedintern

Hi @echarso, @lowballedintern, we are now starting to work on the beta for Spark integration. I was wondering if any of you'd be open to chatting about how you want to see that implemented?

If yes, feel free to stop by Discord https://discord.com/invite/xZjKRaNp8b, drop a line to [email protected], or maybe describe here how you'd see the ideal solution?

elenasamuylova avatar Oct 28 '22 13:10 elenasamuylova

Hi, may I know if is there any update on the spark integration? is there any timeline for this? Thank you!

luckyfgong avatar Jun 06 '23 03:06 luckyfgong

Is it possible to use Evidently with Spark Dataframe now? I have huge amount of data in spark dataframe and converting it in pandas dataframe would be time taking. What are other ways to integrate it? Let me know if I can utilize beta version of the feature.

prity-k avatar Oct 12 '23 16:10 prity-k

Hi @prity-k,

Spark support is currently in development. If you want to test it pre-release, here are the instructions (currently works from several data drift metrics): https://github.com/evidentlyai/evidently/pull/806

elenasamuylova avatar Oct 16 '23 10:10 elenasamuylova