bulk
bulk copied to clipboard
Add support for images
Thanks for the great library. I went ahead and added some support for images. The images app also has 2 new features that are particularly relevant, specifically the circle size and the row height (in case the images do not fit nicely. Future PRs will include the ability to represent images in a grid (not DataTable), but I am not sure how to do that as I just started learning Bokeh last night. Any help here would be great.
I gave it a quick review while in the train. Will try to give it a better review tomorrow back home.
For the future though, it would be better if you first make an issue so that we may discuss the direction of the feature before you work on a PR. That way, we might align before any code is written.
You got it! Sorry about that. I wasn't initially intending to make it a PR, but in the future I will write issues first.
So first of all; I do like the direction of the PR. I was planning on doing images too, so it's nice to see that other folks had the same idea.
I think my main feedback now is that, apart from your own todo-list, we need some sort of docs with an example. Because right now, I'm missing something that I can copy/paste so that I can try it out locally.
How did you encode the images? Did you use VGG16? Might it be possible to have a demo that uses images from the picsum service? I'm happy to think along, but in order for folks to try out the example, we will need to think about this.
I'm currently thinking that a demo that uses MobileNet might be fun? Open to suggestions though.
Let's work on this before doing more work on the UI. I might also have some comments on the slides, but I think having a good demo is a thing we should tackle before that.
Thanks for the kind and thoughtful comments. These images were encoded with CLIP, but I have another local example of using VGG16. More than happy to put together an example from start to finish next week. I see images as being a faster way of implementing PixPlot. I am flying to South Africa Wednesday, so I will try to do some of the documentation and example work Monday/Tuesday
Okay I have two examples ready to go from beginning to end.
One is for the sciences using botanical sample images where the images are sitting on an image server. This is straightforward and simple to use with bulk because the way bokeh's Server class generated at localhost can reach the urls which are located on a website and able to be pulled directly as an HTML img tag with src pointing to the image.
The second example is more complicated. It uses local images which I scraped from e-codices (medieval manuscripts). We absolutely could just point to the URLs, but the point of this example is to show how to work with local images. Since these files are local (something a user would realistically want), it means that we need to make sure the Server class has access to them. I have a working script but it requires a user to run bokeh serve to create a directory-based server. Do you know of a way to use the Server class (currently implemented in bulk) to have access to local image files in a user-specified directory?
I am flying to South Africa tomorrow so I won't be able to respond until the weekend.
If I think about the "average user" then I imagine it's far more likely that you'll be working with images on disk. You'll need to have them in memory if you want to make embeddings. So that is making me think that it may be best to assume a directory of images.
So it feels like we'll need to explore how Bokeh can serve the images from a local folder, but I gotta admit that I've never done that before. There are StackOverflow answers that suggest it's possible to get a local folder to work though.
I spent a few hours yesterday going through the docs on the Server class and Stackoverflow and I came away with the idea that it was possible, but couldn't get it to work properly. If you like, I will send you my working example of e-codices and local files so that maybe you can figure out how to best implement it in bulk?
I have an idea where we create the entire directory of the application for the user with the static subfolder and automatically copy the images from the directories specified into the static subdirectory. We could then use sys or some other way to run the CLI command for the command python -m bokeh serve myapp --show.
https://github.com/wjbmattingly/e-codices-modeling
The docs are a bit lacking, but hopefully it is enough to get you started.
Cool. I can't promise when I'll have a look, but it's nice to know it's around 👍
Cool! No rush at all. I am moving this to the back burner for right now while I finish up a few other projects.
Note! While still experimental, I've started work on embetter. It's a library that should make it easier to just grab and toy around with some useful embeddings and it offers support for TorchVision models. It's in alpha now, but I figured it couldn't hurt to share.
I'm not announcing it just yet, but it can be very helpful when exploring embeddings for images.
Thanks for this! I just had a look and this looks like a much cleaner implementation than what we were considering at SI. We were looking at docarray. It has a very similar idea.
I will play around with embetter a bit this week with the e-codices project.
I'm slowly getting back from praternity leave and I'm interested in getting this feature in.
I tried running what's currently here and hit an error though.
I made a file called imgstarred.csv with the following contents:
path,x,y
screenshot.png,0.0,1.0
screenshot-images.JPG,1.0,0.0
And I tried running bulk via:
python -m bulk images bhl_umap_subsample.csv
This led to the following error:
About to serve `bulk` over at http://localhost:5006/.
Uncaught exception GET / (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost', method='GET', uri='/', version='HTTP/1.0', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/opt/python/3.10.4/lib/python3.10/site-packages/tornado/web.py", line 1713, in _execute
result = await result
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/server/views/doc_handler.py", line 54, in get
session = await self.get_session()
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/server/views/session_handler.py", line 145, in get_session
session = await self.application_context.create_session_if_needed(session_id, self.request, token)
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/server/contexts.py", line 242, in create_session_if_needed
self._application.initialize_document(doc)
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/application/application.py", line 194, in initialize_document
h.modify_document(doc)
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/application/handlers/function.py", line 143, in modify_document
self._func(doc)
File "/workspaces/bulk-images/bulk/images.py", line 68, in bkapp
p.plot_width = 500
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/core/has_props.py", line 304, in __setattr__
self._raise_attribute_error_with_matches(name, properties)
File "/home/codespace/.local/lib/python3.10/site-packages/bokeh-3.0.0.dev15-py3.10.egg/bokeh/core/has_props.py", line 339, in _raise_attribute_error_with_matches
raise AttributeError(f"unexpected attribute {name!r} to {self.__class__.__name__}, {text} attributes are {nice_join(matches)}")
AttributeError: unexpected attribute 'plot_width' to figure, similar attributes are outer_width, width or min_width
500 GET / (127.0.0.1) 56.80ms
I don't mind diving into this one, but before I do, @wjbmattingly do you have time/interest/availability to continue working on the image feature?
I've started my own branch due to the radio silence. It also that works via a different mechanic. Because we can't assume the folder is in a certain position, I'm base64-encoding all images in the dataframe directly.
Your work inspired mine, so definitely let me know if you appreciate another method of giving you contribution points.