Feature: Example Level-of-Detail
Excellent tool in a really important space for people to use to understand big datasets!
Feature idea: A new layer option that shows examples at sparse peaks of the density field.
This is a variant of Marker Clustering or Photo Marker Clustering.
Motivation: Like neural nets, we learn well by example. Sometimes, examples trump summaries. When looking at a crowd of people, we start to understand the distribution by seeing examples. Embedding visualizations leverage this way our brains work.
The user would be able to zoom around the density plot and scan lots of examples. LoD logic would mean you could see examples at various levels of detail without clicks.
Thinking about the implementation... With some heuristic to find peaks in the density field, the embedding-atlas could show some number of examples (say, 15) at each level of zoom in a preview box. If the user clicks the preview box, it could open the full table associated with the example in the bottom panel.
I am sure this would require significant effort - you'd need an algorithm to find local peaks from the density, some zoom-level update logic, and a custom preview component. However, it seems interesting enough as an interaction pattern that I felt it's worth proposing. Let me know what you think! And thanks again for sharing this project.
Could you create a few sketched of this idea? Embedding atlas already shows the actual points even when you look at density and it does show you individual points when you zoom in.
https://github.com/user-attachments/assets/04470607-20f7-4024-8e72-73a4c7f9dae6
I often don't like the clustering on maps since it's often too aggressive and shows data not where it actually is on the map.
Or are you suggesting that the table could maybe sync with the data that is currently in view (at least scrolling to it or highlighting it)?
@domoritz Yes, I will work on a few sketches.
For text, my suggestion is very close to what already exists. It would just replace the summary words with actual examples centered on that point. It's a good point that there's a tradeoff:
- Summary text can more faithfully represent a group of points - one example may be "central" to many points, but the full text can have features that aren't representative of it
- Summary text is terse, so you can still see details in the density plot behind it, and you can fit many summaries on the screen.
- However, example text avoids the summarization step which can sometimes miss style, etc. which might be influencing the neighborhood of points
For images, it's more obvious what this would add. If the images don't have captions, and if CLIP embeddings won't give meaningful captions, there's no great way to get summary text. Also, there's a "picture is worth 1000 words" thing going on here. Plotting example images (again, with similar zoom interactions to the summary text that gets plotted) would give a really powerful way to zoom around an image dataset. There is still the "over-implication" problem (related to your concern about over-aggressiveness) where one example may or may not be highly representative of its neighbors. There's also the challenge that plotting image previews will obscure the density map.
I'm not suggesting to dynamically filter the table to the view center. However, let's think about that. If you had an indicator on the density plot (e.g. highlighted points) to show where the table is filtered, it could give a nice way to zoom around. But there's a cognitive task to always relate the table to the densities. If you instead show examples on top of the density, you can give intuitions about the densities in their own spatial reference.
I'll work on sketches. Thanks for responding!
I see. Yeah, for images I can see examples being useful. For text, we could maybe show the first n characters of a sample.
Here's a low-fidelity mock-up for the image use case. The text use case could be this, but as snippets as you mention.
https://github.com/user-attachments/assets/0c2e9791-ea34-4c91-a3da-2365877bc95d
Thanks for the example! Sort of like https://poloclub.github.io/wizmap/?dataset=diffusiondb but with showing images not only on hover.
I don't know when we get to it but I'll leave the issue here for the idea.
Yes, so it would be more like seeing several landmarks to understand a landscape and less like perceiving points along a path (the mouse path, in this case). The benefit would be greater for images than for text, I think, especially if the text is easy to summarize.
The wizmap hover interaction is similar to what I get with Embedding Atlas with a data-url image column:
https://github.com/user-attachments/assets/04822de4-0e51-4382-b50c-a71aed0c4c6f
Thanks for the discussion - I do imagine it's a larger feature to implement.