Are there ways to improve the accuracy of Question Answering?
I followed the instructions from this notebook and started with just one document, a driving guide for British Columbia.
While the model manages to answer some questions, a lot of answers even to simple questions (like "what does flashing green light mean?" or "can you cross a solid yellow line?") are either wrong or said to be unavailable (despite the document containing them).
Is there a way to improve the accuracy of matching?
I think you might've run into the number of pages limitation.
Document AI
The following limits apply for online processing with the Document OCR processor.
Limit | Value -- | -- Maximum file size | 20 MB **Maximum pages | 15**For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)
Document AI The following [limits](https://cloud.google.com/document-ai/quotas) apply for online processing with the Document OCR processor.Limit Value Maximum file size 20 MB Maximum pages 15 For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)
@admatt01's comment is not relevant to this issue because the notebook linked doesn't use Document AI.
My theory is that the current setup of the Question Answering with Documents using LangChain 🦜️🔗 and Vertex AI Matching Engine notebook isn't reading or parsing all of the text from the document. I'll need to do some investigation to see if the GCS Loader actually reads in the text or does OCR.
You actually might have more luck using the Question answering with Documents using Document AI, Pandas, and PaLM notebook which will perform OCR on the input documents.