Phi-3CookBook icon indicating copy to clipboard operation
Phi-3CookBook copied to clipboard

How to get text coordinates (bbox) from phi-3 vision

Open ladanisavan opened this issue 6 months ago • 4 comments

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Hello,

First, thank you for the incredible work you have shared with the phi community. I am wondering if there is a way to obtain the text coordinates (bounding boxes) from the phi-3 vision generated output for an input image? This feature would be immensely beneficial for various applications that rely on precise text positioning.

Thank you for considering this request.

ladanisavan avatar Aug 02 '24 10:08 ladanisavan