Word vs Sentence Detection - Visual Question Answering

Open DelmedigoA opened this issue 1 year ago • 1 comments

Hello everyone, I’m currently using the model, and it performs exceptionally well at detecting sentences. However, I’m wondering if it could also be adapted for word-level detection. If so, could anyone advise on what settings might need to be adjusted, or if it’s more about image preprocessing? I’m asking because many Hugging Face VQA models rely on word-level tokenization, and I’m looking to align with that approach. Thanks alot!

Aug 16 '24 10:08 DelmedigoA

Did you find the answer?

Dec 04 '24 05:12 vakidzaci