jupyter-scatter icon indicating copy to clipboard operation
jupyter-scatter copied to clipboard

Support dataframe interchange protocol (e.g. polars)

Open jdonaldson opened this issue 5 months ago • 5 comments

I have a quick and dirty ask, is it possible for jupyter-scatter to support the dataframe interchange protocol? This would enable it to support pandas dataframes and polars dataframes (or anything else that follows the spec) interchangeably.

https://arrow.apache.org/docs/python/interchange_protocol.html

jdonaldson avatar Jun 19 '25 03:06 jdonaldson

That should be possible and I started looking into https://github.com/narwhals-dev/narwhals but as usual it comes down to finding time to do the refactor. If you have time and are willing to look into this, I'm happy to help and get a PR in.

flekschas avatar Jun 19 '25 13:06 flekschas

I'm getting an lfs error here, it looks like the docs folder adds a lot of bloat that Github wants you to pay for. It might make sense to refactor out the docs into a separate branch and build documentation from there. You want me to take a swing at it? It probably will involve a rebase.

~/Projects
base ❯ git clone https://github.com/flekschas/jupyter-scatter.git
Cloning into 'jupyter-scatter'...
remote: Enumerating objects: 2605, done.
remote: Counting objects: 100% (608/608), done.
remote: Compressing objects: 100% (179/179), done.
remote: Total 2605 (delta 524), reused 436 (delta 429), pack-reused 1997 (from 2)
Receiving objects: 100% (2605/2605), 3.21 MiB | 8.44 MiB/s, done.
Resolving deltas: 100% (1539/1539), done.
Downloading docs/public/images/annotations-contour-by-dark.png (27 KB)
Error downloading object: docs/public/images/annotations-contour-by-dark.png (a1150db): Smudge error: Error downloading docs/public/images/annotations-contour-by-dark.png (a1150db6d103d45543c4678a8707eb522947e55afe25eb53abe05f33e9a45e38): batch response: This repository exceeded its LFS budget. The account responsible for the budget should increase it to restore access.

Errors logged to '/Users/jdonaldson/Projects/jupyter-scatter/.git/lfs/logs/20250619T103246.456465.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: docs/public/images/annotations-contour-by-dark.png: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

jdonaldson avatar Jun 19 '25 17:06 jdonaldson

Arghhh... that's annoying. I like having the docs next to the actual code. Having to jump back and forth between repos to document things would be annoying. Let me see if there's another way around this.

flekschas avatar Jun 19 '25 19:06 flekschas

Can you try again?

flekschas avatar Jun 19 '25 19:06 flekschas

Thanks I'll check it out again here soon. It works great for my usecase, I'm comparing a bunch of neural network embedding spaces and linking them together with your brushing techniques. Feels like an interface that will become more common.

jdonaldson avatar Jun 19 '25 19:06 jdonaldson

Just popping in to say that the DataFrame interchange protocol seems to have stalled, and especially when you use Arrow internally (as it seems jupyter-scatter does) it is much simpler and more efficient to implement the Arrow PyCapsule Interface instead, which is supported by Polars and many other Python database clients and DataFrame libraries.

I can make a separate issue for this if you'd prefer

kylebarron avatar Oct 07 '25 14:10 kylebarron