haystack-integrations
haystack-integrations copied to clipboard
Add jaguar-document-store.md
This PR is attempting to add a new document store for haystack framework. The jaguar document store is a distributed database that can be scaled horizontally easily (instant horizontal scaling with its ZeroMove mechanism). It can store documents, vectors, and blobs data. Also it is capable to detect anomalous documents. It can search similar documents with time decay modulations. The software supports multi-tenant multi-member, single-tenant multi-member, and single-tenant single-member cloud operations models.
@bilgeyucel Thanks so much for the suggested changes which are really helpful. We made changes on file jaguar-document-store.md and added JaguarEmbeddingRetriever in the file retriever.py in the https://github.com/fserv/haystack-integrations, directory /src/jaguar_haystack/document_stores. Other suggested changes are also added in the new commit. Could you please review again? Thanks.
@bilgeyucel Thanks for all the pointers and suggestions! All needed changes are made and pushed. The default "all-mpnet-base-v2" embedding model has dimension of 768, which may have caused the js['data'] error. Also please make sure the jaguar server and its http gateway server are up-running after the "docker pull jaguardb/jaguardb_with_http; docker run -d -p 8888:8888 -p 8080:8080 --name jaguardb_with_http jaguardb/jaguardb_with_http" commands (may require sudo on your system).
Hi @fserv, I get the same errors. Can you make sure that the new version of the package is published on pip?
Here's the code snippet I used to test:
from jaguar_haystack.jaguar import JaguarDocumentStore
url = "http://127.0.0.1:8080/fwww/"
pod = "vdb"
store = "haystack_test_store"
vector_index = "v"
vector_type = "cosine_fraction_float"""
vector_dimension = 1536 # dim of "text-embedding-ada-002" by OpenAI
document_store = JaguarDocumentStore(
pod,
store,
vector_index,
vector_type,
vector_dimension,
url,
)
print(document_store.filter_documents({})) # Should return [] -> works ✅
print(document_store.count_documents()) # Should return 0 -> fails, throws jd = json.loads(js[0]) error
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.dataclasses import Document
embedder = OpenAIDocumentEmbedder(api_key=OPENAI_API_KEY)
result = embedder.run(documents=[Document(content="Return of King Lear")])
document_store.write_documents(documents=result["documents"]) # should write the documents
print(document_store.count_documents()) # Should return 1
Hi @bilgeyucel Sorry. The package is now updated. You can try "pip install -U jaguar-haystack" to get the latest package. Thanks!
Hi @fserv, I can't seem to write_documents()
into JaguarDocumentStore
even with the new version. Here's the error:
Traceback (most recent call last):
File "/Users/bilgeyucel/Documents/side-projects/jaguar-haystack/test.py", line 28, in <module>
document_store.write_documents(documents=result["documents"]) # should write the documents
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/jaguar_haystack/jaguar.py", line 124, in write_documents
zid = self.add_text(text, embedding, metadata, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/jaguar_haystack/jaguar.py", line 260, in add_text
textcol = js["data"]
~~^^^^^^^^
KeyError: 'data'
Maybe there's something wrong how I run the docker image. These are the logs that I get in the container. I'm using MacBook Air M2
2024-01-30 21:17:03 Starting jaguardb in docker container
2024-01-30 21:17:44 Starting fwww/http server in docker container
2024-01-30 21:17:44 Restart netmap_server ...
2024-01-30 21:17:44 netmap_server is not running
2024-01-30 21:17:46 Restart pyai_server ...
2024-01-30 21:17:46 pyai_server is not running
2024-01-30 21:17:46 netmap_server is not running
2024-01-30 21:17:48 Restart lighttpd and fwww ...
2024-01-30 21:17:48 Stopping fwww ...
2024-01-30 21:17:48 pkill is /usr/bin/pkill
2024-01-30 21:17:48 pyai_server is not running
2024-01-30 21:17:49 Name: sentence-transformers
2024-01-30 21:17:49 Version: 2.2.2
2024-01-30 21:17:49 Summary: Multilingual text embeddings
2024-01-30 21:17:49 Home-page: https://github.com/UKPLab/sentence-transformers
2024-01-30 21:17:49 Author: Nils Reimers
2024-01-30 21:17:49 Author-email: [email protected]
2024-01-30 21:17:49 License: Apache License 2.0
2024-01-30 21:17:49 Location: /usr/local/lib/python3.10/dist-packages
2024-01-30 21:17:49 Requires: huggingface-hub, nltk, numpy, scikit-learn, scipy, sentencepiece, torch, torchvision, tqdm, transformers
2024-01-30 21:17:49 Required-by:
2024-01-30 21:17:49 Found sentence-transformers pip package, OK
2024-01-30 21:17:50 /home/jaguar/fwww/conf_dir/lighttpd.conf is found, OK
2024-01-30 21:17:50 /home/jaguar/fwww/bin_dir/lighttpd -f /home/jaguar/fwww/conf_dir/lighttpd.conf
hi @bilgeyucel most likely one of pyai_server, or netmap_server, or fwww server process is not running properly. You can try:
- docker exec -it jaguardb_with_http /bin/bash
- ps aux|grep netmap
- ps aux|grep pyai
- ps aux|grep lighttp
- ps aux|grep fwww
If any server process is not up, you can do this:
cd /home/jaguar/fwww/bin_dir ./start_all_servers.sh
and check again with the "ps aux|grep ..." above. There might be package issues, etc.
hi @bilgeyucel We did some debugging and found out our documentation missed the document_store.login("demouser") and document_store.create() steps. Sorry for this error. The server startup messages are just for reporting purposes which can be ignored. We checked Mac system docker container and saw all server processes were started fine. The login() and create() step is added in the latest commits. Please add the login(), create() steps in your script too. Thanks!