anthropic-cookbook
anthropic-cookbook copied to clipboard
Bounding Box Detection
Hello!
After seeing that sonnet is trained for computer use (with exact pixel coordinates) I tried using it for bounding box detection (both open vocab with text input, or few-shot with image input). However, my results have been worse than I expected given claude's performance with computer use. I tried following the best practices outlined in this repo.
My question to you is:
- Can you share what specific normalization/origin location is claude for computer use trained for? So I can use the same set up.
- Any bb grounding related suggestions I should try beyond what is given in the cookbooks.
Thank you very much!