marqo
marqo copied to clipboard
Error during processing of large documents
Docker, 6 CPU, 16 GB of RAM, Mac OS Ventura 13.2.1, M2 Max
Using a slightly modified version of https://github.com/iain-mackie/marqo-gpt3 I am trying to process 10 documents with length of 80000 characters each.
I am making a single call mq.index(DOC_INDEX_NAME).add_documents(docs)
, where docs
contains 10 elements.
Client log
Establishing connection to marqo client.
Indexing documents
Traceback (most recent call last):
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 128, in __validate
request.raise_for_status()
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:8882/indexes/yogi-index/stats
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/bazuker/Documents/user/playground/openai/marqo-gpt3/main.py", line 67, in <module>
print(f'document index build: {mq.index(DOC_INDEX_NAME).get_stats()}')
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 455, in get_stats
return self.http.get(path=f"indexes/{self.index_name}/stats")
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 88, in get
return self.send_request(s.get, path=path, body=body, content_type=content_type)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 74, in send_request
return self.__validate(response)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 131, in __validate
convert_to_marqo_error_and_raise(response=request, err=err)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 145, in convert_to_marqo_error_and_raise
raise MarqoWebError(message=response_msg, code=code, error_type=error_type,
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {'message': 'Index `yogi-index` not found.', 'code': 'index_not_found', 'type': 'invalid_request', 'link': None}
status_code: 404, type: invalid_request, code: index_not_found, link:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 128, in __validate
request.raise_for_status()
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:8882/indexes/yogi-index/documents?refresh=true&device=cpu&use_existing_tensors=false
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/bazuker/Documents/user/playground/openai/marqo-gpt3/main.py", line 72, in <module>
response = mq.index(DOC_INDEX_NAME).add_documents(docs)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 277, in add_documents
return self._generic_add_update_docs(
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 393, in _generic_add_update_docs
res = self.http.post(path=path_with_query_str, body=documents)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 96, in post
return self.send_request(s.post, path, body, content_type)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 74, in send_request
return self.__validate(response)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 131, in __validate
convert_to_marqo_error_and_raise(response=request, err=err)
File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 145, in convert_to_marqo_error_and_raise
raise MarqoWebError(message=response_msg, code=code, error_type=error_type,
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {'message': "\nPlease create an issue on Marqo's GitHub repo (https://github.com/marqo-ai/marqo/issues) if this problem persists.", 'code': 'unhandled_backend_error', 'type': 'backend_error', 'link': ''}
status_code: 500, type: backend_error, code: unhandled_backend_error, link:
marqo container log
2023-03-26 15:37:15 INFO: 172.17.0.1:60962 - "DELETE /indexes/yogi-index HTTP/1.1" 200 OK
2023-03-26 15:37:15 INFO: 172.17.0.1:60962 - "GET /indexes/yogi-index/stats HTTP/1.1" 404 Not Found
2023-03-26 15:40:25 INFO: 172.17.0.1:60962 - "POST /indexes/yogi-index/documents?refresh=true&device=cpu&use_existing_tensors=false HTTP/1.1" 500 Internal Server Error
marqo-os container log
2023-03-26 15:37:15 [2023-03-26T22:37:15,896][INFO ][o.o.c.m.MetadataDeleteIndexService] [f9e350fd26d2] [yogi-index/JHquJ9NYQiy5OcKpJ3OQmA] deleting index
2023-03-26 15:37:15 [2023-03-26T22:37:15,949][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:15 [2023-03-26T22:37:15,949][ERROR][o.o.i.i.ManagedIndexCoordinator] [f9e350fd26d2] get managed-index failed: [.opendistro-ism-config] IndexNotFoundException[no such index [.opendistro-ism-config]]
2023-03-26 15:37:15 [2023-03-26T22:37:15,997][INFO ][o.o.c.m.MetadataCreateIndexService] [f9e350fd26d2] [yogi-index] creating index, cause [api], templates [], shards [5]/[1]
2023-03-26 15:37:16 [2023-03-26T22:37:16,021][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,041][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,048][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,058][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,075][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:40:06 [2023-03-26T22:40:06,159][INFO ][o.o.j.s.JobSweeper ] [f9e350fd26d2] Running full sweep
2023-03-26 15:40:16 [2023-03-26T22:40:16,277][INFO ][o.o.c.m.MetadataMappingService] [f9e350fd26d2] [yogi-index/b4q1aq9DRr22BFopiIyQ1A] update_mapping [_doc]
2023-03-26 15:40:16 [2023-03-26T22:40:16,287][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:45:06 [2023-03-26T22:45:06,162][INFO ][o.o.j.s.JobSweeper ] [f9e350fd26d2] Running full sweep
hi @bazuker ! Thanks for raising the issue. I suspect the request size might be too large if those documents are that big. Can you try and send them one by one or use client_batch_size=1
Actually, it looks like an index not found error. Were you able to run any of the examples from the readme?
Was there any other stack trace in the Marqo container logs? If not you can increase the log level to debug while running Marqo here. Finally, did this problem subside when indexing smaller documents?
Also, what Marqo version and client versions are you on? You can check by running this:
import pprint
import marqo
from marqo import errors
mq = marqo.Client()
print("Marqo version information:\n", mq.get_marqo())
print("Marqo python client information:\n", marqo.supported_marqo_version())