jupyter-scatter
jupyter-scatter copied to clipboard
Support dataframe interchange protocol (e.g. polars)
I have a quick and dirty ask, is it possible for jupyter-scatter to support the dataframe interchange protocol? This would enable it to support pandas dataframes and polars dataframes (or anything else that follows the spec) interchangeably.
https://arrow.apache.org/docs/python/interchange_protocol.html
That should be possible and I started looking into https://github.com/narwhals-dev/narwhals but as usual it comes down to finding time to do the refactor. If you have time and are willing to look into this, I'm happy to help and get a PR in.
I'm getting an lfs error here, it looks like the docs folder adds a lot of bloat that Github wants you to pay for. It might make sense to refactor out the docs into a separate branch and build documentation from there. You want me to take a swing at it? It probably will involve a rebase.
~/Projects
base ❯ git clone https://github.com/flekschas/jupyter-scatter.git
Cloning into 'jupyter-scatter'...
remote: Enumerating objects: 2605, done.
remote: Counting objects: 100% (608/608), done.
remote: Compressing objects: 100% (179/179), done.
remote: Total 2605 (delta 524), reused 436 (delta 429), pack-reused 1997 (from 2)
Receiving objects: 100% (2605/2605), 3.21 MiB | 8.44 MiB/s, done.
Resolving deltas: 100% (1539/1539), done.
Downloading docs/public/images/annotations-contour-by-dark.png (27 KB)
Error downloading object: docs/public/images/annotations-contour-by-dark.png (a1150db): Smudge error: Error downloading docs/public/images/annotations-contour-by-dark.png (a1150db6d103d45543c4678a8707eb522947e55afe25eb53abe05f33e9a45e38): batch response: This repository exceeded its LFS budget. The account responsible for the budget should increase it to restore access.
Errors logged to '/Users/jdonaldson/Projects/jupyter-scatter/.git/lfs/logs/20250619T103246.456465.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: docs/public/images/annotations-contour-by-dark.png: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
Arghhh... that's annoying. I like having the docs next to the actual code. Having to jump back and forth between repos to document things would be annoying. Let me see if there's another way around this.
Can you try again?
Thanks I'll check it out again here soon. It works great for my usecase, I'm comparing a bunch of neural network embedding spaces and linking them together with your brushing techniques. Feels like an interface that will become more common.
Just popping in to say that the DataFrame interchange protocol seems to have stalled, and especially when you use Arrow internally (as it seems jupyter-scatter does) it is much simpler and more efficient to implement the Arrow PyCapsule Interface instead, which is supported by Polars and many other Python database clients and DataFrame libraries.
I can make a separate issue for this if you'd prefer