polars
polars copied to clipboard
Java / Scala bindings for Polars?
Problem description
Are there ideas or plans around bindings for JVM-based languages like Java, Kotlin or Scala? Especially the latter might be of interest in combination with Spark. Users could pull the distributed data into memory, similar to psdf.to_pandas() (ref), and continue data analysis in Polars.
Or is the maintenance cost of yet another language binding prohibitive?
Since the bindings are very time consuming they are primarily developed & maintained by the community. IIRC, there were some discussions in the polars discord channel about adding JVM bindings.
Internally pyspark seems to use arrow memory layout already (as they require pyarrow installed), but I don't find directly if they can consume arrow table directly (without the pandas conversion). Else you could just convert Polars DataFrames to pyarrow Tables (zero copy for most datatypes, except Categoricals).
On Discord, someone wrote some crude Java bindings for a subset of the Polars API, but I don't think it is public.
but I don't find directly if they can consume arrow table directly (without the pandas conversion)
You can and its cheaper than going via pandas: https://stackoverflow.com/a/73205690/6717054
OK, seems like there is already a path for PySpark --> Polars. And since the bulk of Spark users nowadays use the Python API there likely is not a great need to support JVM languages from that perspective.
Bindings for JVM languages will be a hefty community effort that I personally don't see taking off (given the lack of overlap between JVM languages and the ones that data scientists use).
Thanks for the quick answers!
seems like there is already a path for PySpark --> Polars
@nicodv Could you share a link to this?
I was referring to @ritchie46 's StackOverflow answer above, @ddanieltan .
There's this https://github.com/chitralverma/scala-polars, unfortunately it doesn't seem active.
I'm closing this as "not planned". Though we encourage the community to develop Polars bindings for other languages, we will not take the initiative for this. If there is significant interest, we can open a channel for it on our Discord in the "languages" section.