Iván Martínez
Iván Martínez
The current chunking mechanism (which splits documents in sentences) is not optimal for CSVs. CSVs contain characters such as , that are token-hungry. Most probably that chunking mechanism is creating...
We are not using llama.cpp as the embeddings model anymore. Plus, ingest got a LOT faster with the use of the new embeddings model #224 Note: this is a breaking...
Did you follow the ingest process of your document? Was the 'db' directory created during that ingestion?
> No. > > Anyway, now i found out the DB folder is created and two parquet files are there, however i tried to do the Ingest part again and...
Apparently it is an error [thrown by Chroma db](https://github.com/chroma-core/chroma/blob/791653e5503dbb2ceb34ff37896c3c34cf6eb95a/chromadb/db/index/hnswlib.py#L230) when querying it (trying to find the context for the prompt using the vectorstore) I'd say your **ingest process didn't work**....
Currently sources are being printed to ease up debugging, but feel free to comment out the sources printing at the bottom of `privateGPT.py`
Sounds great! Would you open a PR @su77ungr ?
> https://github.com/su77ungr/CASALIOY > > I hard forked the repository and switched to Qdrant vector storage. Runs locally as well and hits faster requests. This solved the issue for me Tested...
The error has to do with symbols being present in the original doc. There definitely are some of those in the test document used by this repo. But it is...
I'll keep an eye on the improvements you pointed @su77ungr and also on your fork. Thanks for sharing!!