Verba icon indicating copy to clipboard operation
Verba copied to clipboard

Verba fails to identify / classify data correctly

Open Network-Sec opened this issue 1 year ago • 1 comments

Description

Using a pure-local config with Ollama and Unstructured in Docker I can import CSV data and interact with it (win!): OLLAMA_URL="http://192.168.2.204:9350" OLLAMA_MODEL="llama3" OLLAMA_EMBED_MODEL="mxbai-embed-large" UNSTRUCTURED_API_URL="http://192.168.2.216:9360/general/v0/general" UNSTRUCTURED_API_KEY="pseudokey"

Problem: When importing a csv with columns like "First Name, Last Name, DoB, Phone" and asking for the phone number of "Steven", Verba does find the data but fails to identify it correctly.

Q: "Give me the phone number of all people with first name Steven" A: "According to the provided context, the phone number of Steven Smith (born 21/11/1979) is not explicitly mentioned. However, based on the chunk numbers and the format of the data, we can infer that Steven Smith's phone number might be somewhere in the range of 00117xxxxxxx, but it would require more information or a specific document to retrieve the exact phone number."

Is this a bug or a feature?

  • [ ] Bug
  • [X] Feature

Steps to Reproduce

  1. Import small csv table with columns like "First Name, Last Name, DoB, Phone", using the provided local configuration.
  2. Ask about phone number of some of the people contained in the data

Additional context

Fully aware this isn't exactly a bug, but at this point I'm completely blind to possible causes or solutions. The data is well-formed and data points should be easy to identify for the model. I don't know, how to improve the issue: Do I need to import or chunk it differently? Is it to be solved during inference? What could help in this case?

I'll call it a feature request for now, maybe we just need a few more options to make this work...

Network-Sec avatar Jun 08 '24 20:06 Network-Sec

Good point! I can see that long tables might confuse the current retrieval system and, thus, the selected LLM. I think adding metadata in the future could fix this. I'll add it to the feature list 🚀

thomashacker avatar Jun 27 '24 13:06 thomashacker

We added metadata to the newest release, this could fix the issue

thomashacker avatar Sep 03 '24 13:09 thomashacker