datasketch icon indicating copy to clipboard operation
datasketch copied to clipboard

Getting `pymongo.errors.ExecutionTimeout` for using more than 1 instance of AsyncMinHashLSH

Open RonaldRegan69 opened this issue 2 years ago • 3 comments

I have a use case where i have to store min hash of (n) different categories of file and the query them. For example if i have documents of category A and I want to store all of them in one db and then query at later point.

I am initializing this in a microservice as (fastapi) as follows

@app.on_event("startup")
async def startup_event():
    LSHService.lsh_js, LSHService.lsh_css, LSHService.lsh_html, = await asyncio.gather(
        *[
            await AsyncMinHashLSH(
                storage_config={
                    **_storage,
                    "basename": "category-a".encode(),
                },
                threshold=0.9,
                num_perm=256,
            ),
            await AsyncMinHashLSH(
                storage_config={
                    **_storage,
                    "basename": "category-b".encode(),
                },
                threshold=0.9,
                num_perm=256,
            ),
            await AsyncMinHashLSH(
                storage_config={
                    **_storage,
                    "basename": "category-b".encode(),
                },
                threshold=0.9,
                num_perm=256,
            ),
        ]
    )

If i initialize like this i get
Getting pymongo.errors.ExecutionTimeout Error

However If i just maintain 1 object then I dont get any error.

What can I do mitigate this? Thanks in advance

RonaldRegan69 avatar Jun 06 '22 14:06 RonaldRegan69

Not an expert in async. @aastafiev would you be able to take a look?

ekzhu avatar Jun 07 '22 05:06 ekzhu

any update over this?

RonaldRegan69 avatar Jun 13 '22 08:06 RonaldRegan69

Hi. I see you are trying to create duplicated categories (category-b). Is it normal? Maybe mongo don’t “like” it. Next. Could you try to create objects step by step (just for experiment), not in gather function?


Sincerely yours, Aleksey Astafiev

On 13 Jun 2022, at 12:01, RonaldRegan69 @.***> wrote:  any update over this?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

aastafiev avatar Jun 13 '22 08:06 aastafiev