surya
surya copied to clipboard
Word vs Sentence Detection - Visual Question Answering
Hello everyone, I’m currently using the model, and it performs exceptionally well at detecting sentences. However, I’m wondering if it could also be adapted for word-level detection. If so, could anyone advise on what settings might need to be adjusted, or if it’s more about image preprocessing? I’m asking because many Hugging Face VQA models rely on word-level tokenization, and I’m looking to align with that approach. Thanks alot!
Did you find the answer?