Simon Willison
Simon Willison
Basic Gemini example from https://github.com/simonw/llm-gemini/blob/4195c4396834e5bccc3ce9a62647591e1b228e2e/llm_gemini.py (my `images` branch): ```python messages = [] if conversation: for response in conversation.responses: messages.append( {"role": "user", "parts": [{"text": response.prompt.prompt}]} ) messages.append({"role": "model", "parts": [{"text": response.text()}]})...
Example from Google AI Studio: ```bash API_KEY="YOUR_API_KEY" # TODO: Make the following files available on the local file system. FILES=("image.jpg") MIME_TYPES=("image/jpeg") for i in "${!FILES[@]}"; do NUM_BYTES=$(wc -c < "${FILES[$i]}")...
Here's Gemini Pro accepting multiple images at once: https://ai.google.dev/gemini-api/docs/vision?lang=python#prompt-multiple ```python import PIL.Image sample_file = PIL.Image.open('sample.jpg') sample_file_2 = PIL.Image.open('piranha.jpg') sample_file_3 = PIL.Image.open('firefighter.jpg') model = genai.GenerativeModel(model_name="gemini-1.5-pro") prompt = ( "Write an advertising...
I just saw Gemini has been trained to returning bounding boxes. https://ai.google.dev/gemini-api/docs/vision?lang=python#bbox I tried this: ```pycon >>> import google.generativeai as genai >>> genai.configure(api_key="...") >>> model = genai.GenerativeModel(model_name="gemini-1.5-pro-latest") >>> import PIL.Image...
I don't think those bounding boxes are in the right places. I built a Claude Artifact to render them, and I may not have built it right, but I got...
Tried it again with this photo of goats and got slightly more convincing result:   ```pycon >>> goats = PIL.Image.open("/tmp/goats.jpeg") >>> prompt = 'Return...
Oh! I tried different varieties of coordinate and it turned out this one rendered correctly: ``` [255, 473, 800, 910] [96, 63, 700, 390] ``` Rendered:   ```pycon >>> heron = PIL.Image.open("/tmp/heron.jpeg") >>> prompt = 'Return bounding boxes around every heron,...
Based on all of that, I built this tool: https://tools.simonwillison.net/gemini-bbox You have to paste in a Gemini API key when you use it, which gets stashed in `localStorage` (like my...