impact-graph icon indicating copy to clipboard operation
impact-graph copied to clipboard

WORLD_SIMILARITY query is slow

Open aminlatifi opened this issue 9 months ago • 4 comments

SELECT
	COUNT(DISTINCT ("project"."id")) AS "cnt"
FROM
	"public"."project" "project"
	LEFT JOIN "public"."project_status" "status" ON "status"."id" = "project"."statusId"
	LEFT JOIN "public"."project_users_user" "project_users" ON "project_users"."projectId" = "project"."id"
	LEFT JOIN "public"."user" "users" ON "users"."id" = "project_users"."userId"
	LEFT JOIN "public"."project_address" "addresses" ON "addresses"."projectId" = "project"."id"
	LEFT JOIN "public"."organization" "organization" ON "organization"."id" = "project"."organizationId"
	LEFT JOIN "public"."project_qf_rounds_qf_round" "project_qfRounds" ON "project_qfRounds"."projectId" = "project"."id"
	LEFT JOIN "public"."qf_round" "qfRounds" ON "qfRounds"."id" = "project_qfRounds"."qfRoundId"
	INNER JOIN "public"."user" "user" ON "user"."id" = "project"."adminUserId"
	LEFT JOIN "public"."project_categories_category" "project_categories" ON "project_categories"."projectId" = "project"."id"
	LEFT JOIN "public"."category" "categories" ON "categories"."id" = "project_categories"."categoryId"
	AND ("categories"."isActive" = $1)
	LEFT JOIN "public"."main_category" "mainCategory" ON "mainCategory"."id" = "categories"."mainCategoryId"
	LEFT JOIN "public"."project_power_view" "projectPower" ON "projectPower"."projectId" = "project"."id"
	LEFT JOIN "public"."project_instant_power_view" "projectInstantPower" ON "projectInstantPower"."projectId" = "project"."id"
WHERE
	"project"."statusId" = $5
	AND "project"."reviewStatus" = $2
	AND (
		WORD_SIMILARITY ("project"."title", $3) > $4
		OR WORD_SIMILARITY ("project"."description", $3) > $4
		OR WORD_SIMILARITY ("project"."impactLocation", $3) > $4
		OR WORD_SIMILARITY ("user"."name", $3) > $4
	)

This query takes a long time to be executed. image

aminlatifi avatar May 06 '24 09:05 aminlatifi

@jainkrati It's the functionality added by Vitor, as you remember we didn't create any index for it.

aminlatifi avatar May 06 '24 09:05 aminlatifi

Simply adding indexes like below doesn't make the query utilize them,

 CREATE INDEX if not exists trgm_idx_project_title ON project USING GIN (title gin_trgm_ops);
 CREATE INDEX if not exists trgm_idx_project_description ON project USING GIN (description gin_trgm_ops);
 CREATE INDEX if not exists trgm_idx_project_impactLocation ON project USING GIN ("impactLocation" gin_trgm_ops);
 CREATE INDEX if not exists trgm_idx_user_name ON public.user USING GIN (name gin_trgm_ops);

Further investigation is required.

aminlatifi avatar May 06 '24 11:05 aminlatifi

It's a complain about its functionality https://github.com/Giveth/giveth-dapps-v2/issues/4120

aminlatifi avatar May 06 '24 12:05 aminlatifi

Needs further investigation on possible solutions.

jainkrati avatar May 06 '24 13:05 jainkrati

@jainkrati It's needed to evaluate the query execution time change with old values reported here https://github.com/Giveth/impact-graph/issues/1534#issue-2280447642 @maryjaf and @mhmdksh, I guess Krati asked you to team up and do this test.

aminlatifi avatar May 08 '24 09:05 aminlatifi

I have only access on sebastian server to run the postman collection And I have run it There is no time out and 500 error in the result and the status code for all test was 200

image

But I had set the response time as an assertion in the tests and it equals 1000 ms, this assertion has failed because the response time is more than 1 second like as you see in above pic. We need to check what is the expected average response time for setting in the assertions and then i'll change 1000ms to a expected time

maryjaf avatar May 08 '24 09:05 maryjaf

@aminlatifi I see there is another issue related to pg extension for search. I would say, it would be best if you can take this issue along with the other and spend 1 hour everyday to make progress.

jainkrati avatar May 08 '24 12:05 jainkrati

Pls test on staging @mhmdksh @maryjaf and then push to production if all looks good

jainkrati avatar May 08 '24 13:05 jainkrati

I have only access on sebastian server to run the postman collection And I have run it There is no time out and 500 error in the result and the status code for all test was 200

image

But I had set the response time as an assertion in the tests and it equals 1000 ms, this assertion has failed because the response time is more than 1 second like as you see in above pic. We need to check what is the expected average response time for setting in the assertions and then i'll change 1000ms to a expected time

I did the load test but I don't have an access to connect to db, if it should be tested by me please set an access for me @mhmdksh @aminlatifi

maryjaf avatar May 12 '24 09:05 maryjaf

@maryjaf The DB access has been granted to you, please feel free to test on staging

mhmdksh avatar May 13 '24 10:05 mhmdksh

to you, please feel free t

@mhmdksh For test, we must look at the digital ocean metrics to see how long it takes to run each instance of word similarity type search.

aminlatifi avatar May 13 '24 15:05 aminlatifi

to you, please feel free t

@mhmdksh For test, we must look at the digital ocean metrics to see how long it takes to run each instance of word similarity type search.

@mhmdksh I am doing that with Maryam

aminlatifi avatar May 13 '24 15:05 aminlatifi

@maryjaf Can we deploy it on the production?

aminlatifi avatar May 14 '24 12:05 aminlatifi

Yeah, I moved it to done and it's ready for deploying on production

maryjaf avatar May 14 '24 12:05 maryjaf

lets close it @maryjaf @divine-comedian

jainkrati avatar May 29 '24 12:05 jainkrati