polars icon indicating copy to clipboard operation
polars copied to clipboard

Java / Scala bindings for Polars?

Open nicodv opened this issue 2 years ago • 6 comments

Problem description

Are there ideas or plans around bindings for JVM-based languages like Java, Kotlin or Scala? Especially the latter might be of interest in combination with Spark. Users could pull the distributed data into memory, similar to psdf.to_pandas() (ref), and continue data analysis in Polars.

Or is the maintenance cost of yet another language binding prohibitive?

nicodv avatar Jan 18 '23 19:01 nicodv

Since the bindings are very time consuming they are primarily developed & maintained by the community. IIRC, there were some discussions in the polars discord channel about adding JVM bindings.

universalmind303 avatar Jan 18 '23 20:01 universalmind303

Internally pyspark seems to use arrow memory layout already (as they require pyarrow installed), but I don't find directly if they can consume arrow table directly (without the pandas conversion). Else you could just convert Polars DataFrames to pyarrow Tables (zero copy for most datatypes, except Categoricals).

On Discord, someone wrote some crude Java bindings for a subset of the Polars API, but I don't think it is public.

ghuls avatar Jan 18 '23 20:01 ghuls

but I don't find directly if they can consume arrow table directly (without the pandas conversion)

You can and its cheaper than going via pandas: https://stackoverflow.com/a/73205690/6717054

ritchie46 avatar Jan 18 '23 20:01 ritchie46

OK, seems like there is already a path for PySpark --> Polars. And since the bulk of Spark users nowadays use the Python API there likely is not a great need to support JVM languages from that perspective.

Bindings for JVM languages will be a hefty community effort that I personally don't see taking off (given the lack of overlap between JVM languages and the ones that data scientists use).

Thanks for the quick answers!

nicodv avatar Jan 19 '23 01:01 nicodv

seems like there is already a path for PySpark --> Polars

@nicodv Could you share a link to this?

ddanieltan avatar Jan 19 '23 02:01 ddanieltan

I was referring to @ritchie46 's StackOverflow answer above, @ddanieltan .

nicodv avatar Jan 19 '23 04:01 nicodv

There's this https://github.com/chitralverma/scala-polars, unfortunately it doesn't seem active.

cebaa avatar Jul 04 '23 21:07 cebaa

I'm closing this as "not planned". Though we encourage the community to develop Polars bindings for other languages, we will not take the initiative for this. If there is significant interest, we can open a channel for it on our Discord in the "languages" section.

stinodego avatar Jan 26 '24 16:01 stinodego