generative-ai icon indicating copy to clipboard operation
generative-ai copied to clipboard

Are there ways to improve the accuracy of Question Answering?

Open gvzdv opened this issue 1 year ago • 2 comments

I followed the instructions from this notebook and started with just one document, a driving guide for British Columbia.

While the model manages to answer some questions, a lot of answers even to simple questions (like "what does flashing green light mean?" or "can you cross a solid yellow line?") are either wrong or said to be unavailable (despite the document containing them).

Is there a way to improve the accuracy of matching?

gvzdv avatar Jan 09 '24 00:01 gvzdv

I think you might've run into the number of pages limitation.

Document AI

The following limits apply for online processing with the Document OCR processor.

Limit | Value -- | -- Maximum file size | 20 MB **Maximum pages | 15**

For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)

Document AI The following [limits](https://cloud.google.com/document-ai/quotas) apply for online processing with the Document OCR processor.

Limit Value Maximum file size 20 MB Maximum pages 15 For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)

admatt01 avatar Jan 19 '24 02:01 admatt01

@admatt01's comment is not relevant to this issue because the notebook linked doesn't use Document AI.

My theory is that the current setup of the Question Answering with Documents using LangChain 🦜️🔗 and Vertex AI Matching Engine notebook isn't reading or parsing all of the text from the document. I'll need to do some investigation to see if the GCS Loader actually reads in the text or does OCR.

You actually might have more luck using the Question answering with Documents using Document AI, Pandas, and PaLM notebook which will perform OCR on the input documents.

holtskinner avatar Jan 25 '24 17:01 holtskinner