anthropic-cookbook icon indicating copy to clipboard operation
anthropic-cookbook copied to clipboard

Bounding Box Detection

Open batu opened this issue 2 months ago • 0 comments

Hello!

After seeing that sonnet is trained for computer use (with exact pixel coordinates) I tried using it for bounding box detection (both open vocab with text input, or few-shot with image input). However, my results have been worse than I expected given claude's performance with computer use. I tried following the best practices outlined in this repo.

My question to you is:

  1. Can you share what specific normalization/origin location is claude for computer use trained for? So I can use the same set up.
  2. Any bb grounding related suggestions I should try beyond what is given in the cookbooks.

Thank you very much!

batu avatar Dec 19 '24 20:12 batu