datasketch
datasketch copied to clipboard
Getting `pymongo.errors.ExecutionTimeout` for using more than 1 instance of AsyncMinHashLSH
I have a use case where i have to store min hash of (n) different categories of file and the query them. For example if i have documents of category A and I want to store all of them in one db and then query at later point.
I am initializing this in a microservice as (fastapi) as follows
@app.on_event("startup")
async def startup_event():
LSHService.lsh_js, LSHService.lsh_css, LSHService.lsh_html, = await asyncio.gather(
*[
await AsyncMinHashLSH(
storage_config={
**_storage,
"basename": "category-a".encode(),
},
threshold=0.9,
num_perm=256,
),
await AsyncMinHashLSH(
storage_config={
**_storage,
"basename": "category-b".encode(),
},
threshold=0.9,
num_perm=256,
),
await AsyncMinHashLSH(
storage_config={
**_storage,
"basename": "category-b".encode(),
},
threshold=0.9,
num_perm=256,
),
]
)
If i initialize like this i get
Getting pymongo.errors.ExecutionTimeout
Error
However If i just maintain 1 object then I dont get any error.
What can I do mitigate this? Thanks in advance
Not an expert in async. @aastafiev would you be able to take a look?
any update over this?
Hi. I see you are trying to create duplicated categories (category-b). Is it normal? Maybe mongo don’t “like” it. Next. Could you try to create objects step by step (just for experiment), not in gather function?
Sincerely yours, Aleksey Astafiev
On 13 Jun 2022, at 12:01, RonaldRegan69 @.***> wrote: any update over this?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.