sqlboiler icon indicating copy to clipboard operation
sqlboiler copied to clipboard

Performance issue when using .All on large table

Open vitroz opened this issue 5 years ago • 5 comments

If you're having a generation problem please answer these questions before submitting your issue. Thanks!

What version of SQLBoiler are you using (sqlboiler --version)?

v 3.7.1

What is your database and version (eg. Postgresql 10)

Postgresql 12.4

If this happened at generation time what was the full SQLBoiler command you used to generate your models? (if not applicable leave blank)

If this happened at runtime what code produced the issue? (if not applicable leave blank)

What is the output of the command above with the -d flag added to it? (Provided you are comfortable sharing this, it contains a blueprint of your schema)

Please provide a relevant database schema so we can replicate your issue (Provided you are comfortable sharing this)

CREATE TABLE public.people ( id uuid NOT NULL DEFAULT uuid_generate_v1mc(), first_name character varying(60) COLLATE pg_catalog."default" NOT NULL, middle_name character varying(60) COLLATE pg_catalog."default", last_name character varying(150) COLLATE pg_catalog."default" NOT NULL, full_name_raw character varying(255) COLLATE pg_catalog."default", external_source character varying(255) COLLATE pg_catalog."default", external_id character varying(255) COLLATE pg_catalog."default", created_at timestamp without time zone NOT NULL DEFAULT now(), updated_at timestamp without time zone NOT NULL DEFAULT now(), total_campaign_contributions bigint, CONSTRAINT people_pkey PRIMARY KEY (id) )

CREATE INDEX idx_external_id_people ON public.people USING btree (external_id COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default;

CREATE TABLE addresses ( id uuid DEFAULT public.uuid_generate_v1mc() NOT NULL, address text, created_at timestamp without time zone DEFAULT now() NOT NULL, updated_at timestamp without time zone DEFAULT now() NOT NULL, );

CREATE TABLE public.person_delimiters ( person_id uuid NOT NULL, address_id uuid NOT NULL, CONSTRAINT person_addresses_pkey PRIMARY KEY (person_id, address_id), CONSTRAINT person_addresses_address_id_fkey FOREIGN KEY (address_id) REFERENCES public.addresses (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT person_addresses_person_id_fkey FOREIGN KEY (person_id) REFERENCES public.people (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION )

(simplfied versions, actual ones have more columns)

Further information. What did you do, what did you expect?

Hi, I am currently dealing with a performance issue, when trying to fetch many sets of my People entity. My people table, currently holds 132M records, is this impacting the performance? I run

	qmods := []qm.QueryMod{
		qm.Load("Addresses"),
		qm.Limit(limit),
		qm.Offset(offset),
	}

	records, err := entities.People(qmods...).All(context.Background(), apiCtx.DB)

But it holds on this call, for minutes. Im only experiencing this problem on my People entity, I have another one that has about 1.2M records, and it runs fine.

By debugging, I found out, that each call is taking 11~12 minutes consistently

DB call Started at
2020-10-31 15:19:36.121491593 
DB call finished at
2020-10-31 15:30:57.340522467 


Current page: 3
, pages remaining:  23362

DB call Started at
2020-10-31 15:30:57.370881327 
DB call finished at
2020-10-31 15:42:19.255292022 

Any ideas on what might be impacting People entity?

vitroz avatar Oct 31 '20 14:10 vitroz

We need some better profiling here. sqlboiler is going to do 2 queries for this. One where it selects from the People table, and then another where it selects from the Addresses table with an inner join on the many-to-many.

What I'd suggest is try the queries sqlboiler is generating directly in postgres and see if they're problematic. If they are, it could be a result of improper indexing or something of the sort.

Once we find out the database queries are fast, then we can dig into what sqlboiler is doing wrong.

aarondl avatar Nov 04 '20 04:11 aarondl

Hi @aarondl , thanks a lot for your time and response.

I tried the queries and they run fine, on the query mods, I am setting a limit/offset of a 1000 records per call

explain analyze SELECT "addresses".*, "a"."person_id" FROM "addresses" INNER JOIN "person_delimiters" as "a" on "addresses"."id" = "a"."address_id" WHERE ("a"."person_id" IN (SELECT id FROM "people" LIMIT 1000));
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=49.15..32696.41 rows=59366 width=135) (actual time=140.512..147.195 rows=1250 loops=1)
   ->  Nested Loop  (cost=48.72..6257.15 rows=59366 width=32) (actual time=140.494..144.267 rows=1250 loops=1)
         ->  HashAggregate  (cost=48.15..58.15 rows=1000 width=16) (actual time=140.470..140.641 rows=1000 loops=1)
               Group Key: people.id
               ->  Limit  (cost=0.00..35.65 rows=1000 width=16) (actual time=139.979..140.194 rows=1000 loops=1)
                     ->  Seq Scan on people  (cost=0.00..4878658.52 rows=136857552 width=16) (actual time=139.978..140.110 rows=1000 loops=1)
         ->  Index Only Scan using person_delimiters_pkey on person_delimiters a  (cost=0.57..5.61 rows=59 width=32) (actual time=0.003..0.003 rows=1 loops=1000)
               Index Cond: (person_id = people.id)
               Heap Fetches: 0
   ->  Index Scan using addresses_pkey on addresses  (cost=0.43..0.45 rows=1 width=119) (actual time=0.002..0.002 rows=1 loops=1250)
         Index Cond: (id = a.delimiter_id)
 Planning Time: 0.434 ms
 Execution Time: 147.297 ms
(13 records)
explain analyze select * from people limit 1000;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..35.65 rows=1000 width=1863) (actual time=0.006..0.166 rows=1000 loops=1)
   ->  Seq Scan on people  (cost=0.00..4878658.52 rows=136857552 width=1863) (actual time=0.005..0.084 rows=1000 loops=1)
 Planning Time: 39.630 ms
 Execution Time: 0.246 ms
(4 records)

Do you think this has something to do with boiler v3?

vitroz avatar Nov 04 '20 12:11 vitroz

Well there has been quite a decent number of fixes since v3. Doesn't hurt to try the latest. Without being able to hands on this problem there's nothing I can do from here. You'll have to just keep measuring and profiling until you find what's going so slow. Once you narrow down what it is we can decide what to do about it.

aarondl avatar Nov 05 '20 00:11 aarondl

Thanks @aarondl , I've ran VACUUM on the 3 tables involved, and upgraded sqlboiler to 4.3.0 Noticed a great improvement on performance, actually thought the issue was solved, because at first, it ran every call in one second or so, as expected.

But now, a few hours later, its running a DB call every 6 seconds, which is better, but concerning that it might keep dropping.

I am running this DB on AWD, on a mx4. instance, (16 cores, 64GB ram, SSD), my metrics are not popping out on anything specific. (CPU Load, Freeable memory)

vitroz avatar Nov 05 '20 01:11 vitroz

Well keep us posted. Sounds like the vaccuuming is important. Again I'm not 100% sure what to do from here. sqlboiler isn't the most performant thing when it comes to instantiating objects in that style.

If I were you I'd probably replace it with some simpler sqlboiler. Like raw queries with some deliberate pagination.

aarondl avatar Nov 17 '20 02:11 aarondl