h2ogpt
h2ogpt copied to clipboard
Question: can the document query model handle the question to ask something about a table's content?
For example, if I have a table of different DL architecture's training and validation accuracy in each row in my pdf file, can I ask "what's the validation accuracy of architecture XX?"? Would that work?
It depends upon whether to embedding finds the correct block of text and how well the PDF->Text was done.
For PDF->text, PDFMiner is really bad, so we use PyPDFLoader, but we will allow others very soon that will be even better.
For block of text picked up, you can check whether the text was picked up properly by the PDF->text and embedding search by using the UI and setting "Only" as the subset to choose and seeing what the source chunks look like from the UI chat window.
Apart from a better PDF reader, we will be improving beyond simple vectordb to cover a larger set of queries and tasks.
privateGPT, etc. all have the same way of dealing with things, so have similar issues that will have to be dealt with to have the best experience with document Q/A.
As an example where it works is I uploaded a PDF of my electricity bill, and the LLM is able to answer what my total bill cost was -- it's able to find the data and give a human response.
agents should be done.