Phi-3CookBook
Phi-3CookBook copied to clipboard
How to get text coordinates (bbox) from phi-3 vision
This issue is for a: (mark with an x
)
- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Hello,
First, thank you for the incredible work you have shared with the phi community. I am wondering if there is a way to obtain the text coordinates (bounding boxes) from the phi-3 vision generated output for an input image? This feature would be immensely beneficial for various applications that rely on precise text positioning.
Thank you for considering this request.