Spring data relational generates incorrect single query for PostgreSQL with two one-to-many relationships present
The following test fails on PostgreSQL 17.6:
The script src/main/resources/db.sql in the file is a database schema and test data.
There are two one-to-many relationships in the test. And one is larger than another. The incorrect query is generated:
SELECT
case when rn_company_1 = rn THEN c_name_4 else null end as c_name_4,
case when rn_company_1 = rn THEN c_url_5 else null end as c_url_5,
case when rn_company_1 = rn THEN c_industry_6 else null end as c_industry_6,
case when rn_company_1 = rn THEN c_description_7 else null end as c_description_7,
case when rn_company_office_9 = rn THEN c_company_id_14 else null end as c_company_id_14,
case when rn_company_office_9 = rn THEN c_name_15 else null end as c_name_15,
case when rn_company_office_9 = rn THEN c_city_16 else null end as c_city_16,
case when rn_company_office_9 = rn THEN c_address_17 else null end as c_address_17,
c_office_id_13,
key_company_office_12,
case when rn_contact_person_19 = rn THEN c_company_id_24 else null end as c_company_id_24,
case when rn_contact_person_19 = rn THEN c_name_25 else null end as c_name_25,
case when rn_contact_person_19 = rn THEN c_position_26 else null end as c_position_26,
c_contact_person_id_23,
key_contact_person_22,
c_company_id_3
FROM (
SELECT c_name_4, c_url_5, c_industry_6, c_description_7, rn_company_1, c_company_id_3, c_company_id_14,
c_name_15, c_city_16, c_address_17, rn_company_office_9, c_office_id_13, br_company_office_11,
key_company_office_12, c_company_id_24, c_name_25, c_position_26, rn_contact_person_19,
c_contact_person_id_23, br_contact_person_21, key_contact_person_22,
GREATEST(COALESCE(rn_company_1, 1), COALESCE(rn_company_office_9, 1), COALESCE(rn_contact_person_19, 1)) AS rn
FROM (
SELECT 1 AS rn_company_1, 1 AS rc_company_2, "company"."company_id" AS c_company_id_3, "company"."name" AS c_name_4,
"company"."url" AS c_url_5, "company"."industry" AS c_industry_6, "company"."description" AS c_description_7
FROM "company" WHERE "company"."company_id" IN (
--?, ?, ?, ?, ?, ?, ?, ?, ?, ?
'0198b795-ce94-78b3-a2bf-847992d3fb68'
)) t_company_8
LEFT OUTER JOIN (
SELECT row_number() OVER(
PARTITION BY "company_office"."company_id"
ORDER BY "company_office"."company_id"
) AS rn_company_office_9,
count(*) OVER(
PARTITION BY "company_office"."company_id"
) AS rc_company_office_10,
"company_office"."company_id" AS br_company_office_11,
row_number() OVER(
PARTITION BY "company_office"."company_id"
ORDER BY "company_office"."company_id"
) AS key_company_office_12,
"company_office"."office_id" AS c_office_id_13,
"company_office"."company_id" AS c_company_id_14, "company_office"."name" AS c_name_15,
"company_office"."city" AS c_city_16, "company_office"."address" AS c_address_17
FROM "company_office") t_company_office_18 ON c_company_id_3 = br_company_office_11
LEFT OUTER JOIN (
SELECT
row_number() OVER(
PARTITION BY "contact_person"."company_id"
ORDER BY "contact_person"."company_id"
) AS rn_contact_person_19,
count(*) OVER(PARTITION BY "contact_person"."company_id") AS rc_contact_person_20,
"contact_person"."company_id" AS br_contact_person_21,
row_number() OVER(
PARTITION BY "contact_person"."company_id"
ORDER BY "contact_person"."company_id"
) AS key_contact_person_22,
"contact_person"."contact_person_id" AS c_contact_person_id_23,
"contact_person"."company_id" AS c_company_id_24, "contact_person"."name" AS c_name_25, "contact_person"."position" AS c_position_26
FROM "contact_person"
) t_contact_person_27 ON c_company_id_3 = br_contact_person_21
WHERE (rn_company_office_9 = rn_contact_person_19 OR rn_company_office_9 IS NULL OR rn_contact_person_19 IS NULL
OR (rn_company_office_9 > rc_contact_person_20 AND rn_contact_person_19 = 1)
OR (rn_contact_person_19 > rc_company_office_10 AND rn_company_office_9 = 1))
) main
ORDER BY c_company_id_3, rn
The problem seems to be that c_office_id_13 is not null when office is missing.
| c_name_4 | c_url_5 | c_industry_6 | c_description_7 | c_company_id_14 | c_name_15 | c_city_16 | c_address_17 | c_office_id_13 | key_company_office_12 | c_company_id_24 | c_name_25 | c_position_26 | c_contact_person_id_23 | key_contact_person_22 | c_company_id_3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Farrell-Roob R928229 | http://www.colby-wisozk.biz:54185/odit?facilis=provident&labore=incidunt#a | Legislative Office | Upgradable leading edge project | 0198b795-ce94-78b3-a2bf-847992d3fb68 | Chanda Lear | East Bellaton | 37060 Farrell Drives, Strosinburgh, WV 41639 | 0198b795-ce94-7f16-8496-902393f58e70 | 1 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | Mrs. Len Homenick | Orchestrator | 0198b795-ce95-771e-893f-30663574852e | 1 | 0198b795-ce94-78b3-a2bf-847992d3fb68 |
| 0198b795-ce94-78b3-a2bf-847992d3fb68 | Sal A. Mander | New Marlana | Apt. 962 19078 Pete Parkways, Rutherfordmouth, SD 14193 | 0198b795-ce94-7ac9-80fb-1be5fa6890a2 | 2 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | Miss Carlos Lakin | Analyst | 0198b795-ce95-7074-b56d-a555705a3a9f | 2 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | ||||
| 0198b795-ce94-78b3-a2bf-847992d3fb68 | Brandon Cattell | Turcotteshire | 8019 Murazik Plains, Hanhmouth, NV 59964 | 0198b795-ce94-72f0-863b-ca44d7447629 | 3 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | Jacquelynn Kozey | Architect | 0198b795-ce95-7adf-bbec-f66bcfbf1943 | 3 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | ||||
| 0198b795-ce94-7f16-8496-902393f58e70 | 1 | 0198b795-ce94-78b3-a2bf-847992d3fb68 | Laverna Turcotte I | Designer | 0198b795-ce95-7ff8-891b-a2ae54cc7d6c | 4 | 0198b795-ce94-78b3-a2bf-847992d3fb68 |
This causes overriding data with nulls.
[ {
"id" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Farrell-Roob R928229",
"url" : "http://www.colby-wisozk.biz:54185/odit?facilis=provident&labore=incidunt#a",
"industry" : "Legislative Office",
"description" : "Upgradable leading edge project",
"contactPersons" : [ {
"id" : "0198b795-ce95-771e-893f-30663574852e",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Mrs. Len Homenick",
"position" : "Orchestrator"
}, {
"id" : "0198b795-ce95-7074-b56d-a555705a3a9f",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Miss Carlos Lakin",
"position" : "Analyst"
}, {
"id" : "0198b795-ce95-7adf-bbec-f66bcfbf1943",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Jacquelynn Kozey",
"position" : "Architect"
}, {
"id" : "0198b795-ce95-7ff8-891b-a2ae54cc7d6c",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Laverna Turcotte I",
"position" : "Designer"
} ],
"offices" : [ {
"id" : "0198b795-ce94-7f16-8496-902393f58e70",
"companyId" : null,
"name" : null,
"city" : null,
"address" : null
}, {
"id" : "0198b795-ce94-7ac9-80fb-1be5fa6890a2",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Sal A. Mander",
"city" : "New Marlana",
"address" : "Apt. 962 19078 Pete Parkways, Rutherfordmouth, SD 14193"
}, {
"id" : "0198b795-ce94-72f0-863b-ca44d7447629",
"companyId" : "0198b795-ce94-78b3-a2bf-847992d3fb68",
"name" : "Brandon Cattell",
"city" : "Turcotteshire",
"address" : "8019 Murazik Plains, Hanhmouth, NV 59964"
} ]
} ]
The query also contains useless order by in fragments like:
row_number() OVER(
PARTITION BY "contact_person"."company_id"
ORDER BY "contact_person"."company_id"
) AS key_contact_person_22,
In the window "contact_person"."company_id" has the same value, so it makes no sense to sort by it.
I also think that on PostgreSQL the simpler strategy with CTE would have worked that would have produced more readable queries, I think other database should support it as well:
with company_data as (
select
row_number() over (order by c.name, c.company_id) level1,
1 level2,
c.*
from company c
where company_id in ('0198b795-ce94-78b3-a2bf-847992d3fb68')
), company_office_data as (
select
level1,
row_number() over (
partition by cd.level1
order by co.name desc, co.office_id desc
) as level2,
co.*
from company_data cd
join company_office co on co.company_id = cd.company_id
), contact_person_data as (
select
level1,
row_number() over (
partition by cd.level1
order by cp.name, cp.contact_person_id
) as level2,
cp.*
from company_data cd
join contact_person cp on cp.company_id = cd.company_id
)
select
cd.company_id as c_company_id,
cd.description as c_description,
cd.industry as c_industry,
cd."name" as c_name,
cd.url as c_url,
cod.office_id as o_office_id,
cod.address as o_address,
cod.city as o_city,
cod."name" as o_name,
cpd.contact_person_id as p_contact_person_id,
cpd.name as p_name,
cpd.position as p_position
from company_data cd
full outer join company_office_data cod
on cd.level1 = cod.level1 and cod.level2 = cd.level2
full outer join contact_person_data cpd
on coalesce(cd.level1, cod.level1) = cpd.level1
and coalesce(cod.level2, cd.level2) = cpd.level2
order by coalesce(cd.level1, cod.level1, cpd.level1), coalesce(cd.level2, cod.level2, cpd.level2, 1)
This strategy could be naturally extended to the nested one-to-many relationships and and one-to-one relationships.
If schema from test is extended as:
-- public.verification_info definition
-- Drop table
-- DROP TABLE public.verification_info;
CREATE TABLE IF NOT EXISTS public.verification_info (
"comment" varchar(255) NULL,
status varchar(255) NOT NULL,
"timestamp" timestamptz(6) NOT NULL,
username varchar(255) NOT NULL,
company_id uuid NOT NULL,
CONSTRAINT verification_info_pkey PRIMARY KEY (company_id),
CONSTRAINT verification_info_status_check CHECK (((status)::text = ANY ((ARRAY['VERIFIED'::character varying, 'INVALID'::character varying])::text[]))),
CONSTRAINT fk2631d1desupjf5fo8mtgd7srv FOREIGN KEY (company_id) REFERENCES public.company(company_id)
);
CREATE INDEX IF NOT EXISTS verification_info_by_company_id_idx ON public.verification_info USING btree (company_id);
-- public.contact_detail definition
-- Drop table
-- DROP TABLE public.contact_detail;
CREATE TABLE IF NOT EXISTS public.contact_detail (
contact_person_id uuid NOT NULL,
contact_type varchar(255) NOT NULL,
value varchar(255) NOT NULL,
CONSTRAINT contact_detail_contact_type_check CHECK (((contact_type)::text = ANY ((ARRAY['EMAIL'::character varying, 'PHONE'::character varying, 'TELEGRAM'::character varying])::text[]))),
CONSTRAINT contact_detail_pkey PRIMARY KEY (contact_person_id, contact_type, value),
CONSTRAINT fkh7yxxcj9rqwapyld8ll4kswqs FOREIGN KEY (contact_person_id) REFERENCES public.contact_person(contact_person_id)
);
CREATE INDEX IF NOT EXISTS contact_detail_by_contact_person_id_idx ON public.contact_detail USING btree (contact_person_id);
CREATE INDEX IF NOT EXISTS contact_detail_by_value_contact_person_id_idx ON public.contact_detail USING btree (value, contact_person_id);
INSERT INTO contact_detail (contact_person_id,contact_type,value) VALUES
('0198b795-ce95-7074-b56d-a555705a3a9f'::uuid,'EMAIL','[email protected]'),
('0198b795-ce95-771e-893f-30663574852e'::uuid,'PHONE','(983) 409-5386'),
('0198b795-ce95-771e-893f-30663574852e'::uuid,'EMAIL','[email protected]'),
('0198b795-ce95-7adf-bbec-f66bcfbf1943'::uuid,'PHONE','(936) 459-0173'),
('0198b795-ce95-7ff8-891b-a2ae54cc7d6c'::uuid,'PHONE','(730) 214-9396'),
('0198b795-ce95-7ff8-891b-a2ae54cc7d6c'::uuid,'EMAIL','[email protected]') ON CONFLICT DO NOTHING;
INSERT INTO verification_info
(company_id, "comment", status, "timestamp", username)
VALUES( '0198b795-ce94-78b3-a2bf-847992d3fb68'::uuid, NULL, 'VERIFIED', '2024-09-01 18:12:23.528', 'maira.schimmel') ON CONFLICT DO NOTHING;
The full query is only incrementally more complex, as the level 3 is needed to be reduced to the level 2.
with company_data as (
select
row_number() over (order by c.name, c.company_id) level1,
1 level2,
c.*
from company c
where company_id in ('0198b795-ce94-78b3-a2bf-847992d3fb68')
), company_office_data as (
select
level1,
row_number() over (
partition by cd.level1
order by co.name, co.office_id
) as level2,
co.*
from company_data cd
join company_office co on co.company_id = cd.company_id
), contact_person_data as (
select
level1,
row_number() over (
partition by cd.level1
order by cp.name, cp.contact_person_id
) as level2,
1 level3,
cp.*
from company_data cd
join contact_person cp on cp.company_id = cd.company_id
), contact_detail_data as (
select
level1,
level2,
row_number() over (
partition by cpd.level1, cpd.level2
order by cd.contact_type, cd.value
) level3,
cd.*
from contact_person_data cpd
join contact_detail cd on cpd.contact_person_id = cd.contact_person_id
), contact_person_detail_data as (
select
coalesce(cpd.level1, cdd.level1) as level1,
row_number() over (
partition by coalesce(cpd.level1, cdd.level1)
order by coalesce(cpd.level2, cdd.level2), coalesce(cdd.level3, 1)
) as level2,
cpd.contact_person_id as p_contact_person_id,
cpd."name" as p_name,
cpd."position" as p_position,
cdd.contact_type as d_contact_type,
cdd.value as d_value
from contact_person_data cpd
full outer join contact_detail_data cdd
on cpd.level1 = cdd.level1 and cpd.level2 = cdd.level2 and cdd.level3 = cpd.level3
)
select
cd.company_id as c_company_id,
cd.description as c_description,
cd.industry as c_industry,
cd."name" as c_name,
vi.company_id as vi_company_id,
vi."comment" as vi_comment,
vi.status as vi_status,
vi."timestamp" as vi_timestamp,
vi.username as vi_username,
cd.url as c_url,
cod.office_id as o_office_id,
cod.address as o_address,
cod.city as o_city,
cod."name" as o_name,
cpd.p_contact_person_id,
cpd.p_name,
cpd.p_position,
cpd.d_contact_type,
cpd.d_value
from company_data cd
left join verification_info vi on cd.company_id = vi.company_id
full outer join company_office_data cod
on cd.level1 = cod.level1 and cod.level2 = cd.level2
full outer join contact_person_detail_data cpd
on coalesce(cd.level1, cod.level1) = cpd.level1
and coalesce(cod.level2, cd.level2) = cpd.level2
order by coalesce(cd.level1, cod.level1, cpd.level1), coalesce(cd.level2, cod.level2, cpd.level2, 1)
Also, explain analyze on the query from spring data relational has the following output (database with millions rows, selecting 17 rows):
Sort (cost=377969.47..377969.47 rows=2 width=392) (actual time=11.304..11.306 rows=47 loops=1)
Sort Key: company.company_id, (GREATEST('1'::bigint, COALESCE((row_number() OVER (?)), '1'::bigint), COALESCE((row_number() OVER (?)), '1'::bigint)))
Sort Method: quicksort Memory: 36kB
-> Merge Right Join (cost=176507.08..377969.46 rows=2 width=392) (actual time=11.226..11.273 rows=47 loops=1)
Merge Cond: (contact_person.company_id = company.company_id)
Filter: (((row_number() OVER (?)) = (row_number() OVER (?))) OR ((row_number() OVER (?)) IS NULL) OR ((row_number() OVER (?)) IS NULL) OR (((row_number() OVER (?)) > (count(*) OVER (?))) AND ((row_number() OVER (?)) = 1)) OR (((row_number() OVER (?)) > (count(*) OVER (?))) AND ((row_number() OVER (?)) = 1)))
Rows Removed by Filter: 17
-> WindowAgg (cost=0.62..170214.27 rows=2499702 width=97) (actual time=11.095..11.122 rows=45 loops=1)
-> WindowAgg (cost=0.48..132718.74 rows=2499702 width=65) (actual time=11.077..11.101 rows=48 loops=1)
-> Index Scan using contact_persons_by_company_id_idx on contact_person (cost=0.43..88973.96 rows=2499702 width=57) (actual time=11.060..11.070 rows=48 loops=1)
-> Sort (cost=176506.46..176506.54 rows=34 width=284) (actual time=0.120..0.122 rows=63 loops=1)
Sort Key: company.company_id
Sort Method: quicksort Memory: 33kB
-> Merge Right Join (cost=0.99..176505.59 rows=34 width=284) (actual time=0.076..0.106 rows=28 loops=1)
Merge Cond: (company_office.company_id = company.company_id)
-> WindowAgg (cost=0.56..151417.93 rows=2000621 width=144) (actual time=0.053..0.078 rows=32 loops=1)
-> WindowAgg (cost=0.49..121408.61 rows=2000621 width=112) (actual time=0.046..0.062 rows=35 loops=1)
-> Index Scan using company_office_by_company_id_idx on company_office (cost=0.43..86397.74 rows=2000621 width=104) (actual time=0.043..0.046 rows=35 loops=1)
-> Index Scan using company_pkey on company (cost=0.42..79.52 rows=17 width=156) (actual time=0.017..0.019 rows=17 loops=1)
Index Cond: (company_id = ANY ('{0198b791-fb2d-7ec9-9495-47b3704ac8d2,0198b791-fc0d-7845-a5c0-f566553d0aff,0198b791-fc0f-7f83-88c1-10cc7b84170e,0198b791-fc11-7be0-be54-658e1a1c935a,0198b791-fc11-7c02-af71-f90f960b6b2c,0198b791-fc12-7386-acf3-2dd8dd4fdcd1,0198b791-fc13-7477-997e-35a06c6602e1,0198b791-fc13-7379-8823-5f919438c5dc,0198b791-fc13-7b5c-9bd1-2f9dc1b57c36,0198b791-fc14-7b9c-9738-95f6232c7d59,0198b791-fc14-71fd-83f6-b6f681a3b89a,0198b791-fc15-7c4e-8716-605610e7716a,0198b791-fc15-7b0d-b6ef-b031bdc16cb0,0198b791-fc16-780a-9c7c-ccbd1cd68509,0198b791-fc16-7585-9966-4af999efccab,0198b791-fc16-78eb-a472-fb3d2f7b04d4,0198b791-fc17-73b8-9380-09c5188a4f43}'::uuid[]))
Planning Time: 0.607 ms
JIT:
Functions: 32
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.968 ms (Deform 0.406 ms), Inlining 0.000 ms, Optimization 0.560 ms, Emission 10.474 ms, Total 12.002 ms
Execution Time: 21.471 ms
Note that index scan over contact_person and company_office does not have conditions. So it practically selects the entire table.
Compare with plan of my query:
Sort (cost=797.71..798.12 rows=162 width=3510) (actual time=0.465..0.468 rows=75 loops=1)
Sort Key: (COALESCE(cd.level1, cod.level1, cpd.level1)), (COALESCE((cd.level2)::bigint, cod.level2, cpd.level2, '1'::bigint))
Sort Method: quicksort Memory: 36kB
CTE company_data
-> WindowAgg (cost=79.89..80.21 rows=17 width=168) (actual time=0.036..0.041 rows=17 loops=1)
-> Sort (cost=79.87..79.91 rows=17 width=156) (actual time=0.033..0.033 rows=17 loops=1)
Sort Key: c.name, c.company_id
Sort Method: quicksort Memory: 27kB
-> Index Scan using company_pkey on company c (cost=0.42..79.52 rows=17 width=156) (actual time=0.014..0.016 rows=17 loops=1)
Index Cond: (company_id = ANY ('{0198b791-fb2d-7ec9-9495-47b3704ac8d2,0198b791-fc0d-7845-a5c0-f566553d0aff,0198b791-fc0f-7f83-88c1-10cc7b84170e,0198b791-fc11-7be0-be54-658e1a1c935a,0198b791-fc11-7c02-af71-f90f960b6b2c,0198b791-fc12-7386-acf3-2dd8dd4fdcd1,0198b791-fc13-7477-997e-35a06c6602e1,0198b791-fc13-7379-8823-5f919438c5dc,0198b791-fc13-7b5c-9bd1-2f9dc1b57c36,0198b791-fc14-7b9c-9738-95f6232c7d59,0198b791-fc14-71fd-83f6-b6f681a3b89a,0198b791-fc15-7c4e-8716-605610e7716a,0198b791-fc15-7b0d-b6ef-b031bdc16cb0,0198b791-fc16-780a-9c7c-ccbd1cd68509,0198b791-fc16-7585-9966-4af999efccab,0198b791-fc16-78eb-a472-fb3d2f7b04d4,0198b791-fc17-73b8-9380-09c5188a4f43}'::uuid[]))
CTE contact_person_data
-> WindowAgg (cost=146.59..147.76 rows=53 width=77) (actual time=0.092..0.104 rows=42 loops=1)
-> Sort (cost=146.57..146.70 rows=53 width=65) (actual time=0.091..0.093 rows=42 loops=1)
Sort Key: cd_3.level1, cp.name, cp.contact_person_id
Sort Method: quicksort Memory: 28kB
-> Nested Loop (cost=0.43..145.05 rows=53 width=65) (actual time=0.042..0.070 rows=42 loops=1)
-> CTE Scan on company_data cd_3 (cost=0.00..0.34 rows=17 width=24) (actual time=0.037..0.046 rows=17 loops=1)
-> Index Scan using contact_persons_by_company_id_idx on contact_person cp (cost=0.43..8.48 rows=3 width=57) (actual time=0.001..0.001 rows=2 loops=17)
Index Cond: (company_id = cd_3.company_id)
-> Hash Full Join (cost=419.56..563.80 rows=162 width=3510) (actual time=0.391..0.424 rows=75 loops=1)
Hash Cond: ((COALESCE(cd.level1, cod.level1) = cpd.level1) AND (COALESCE(cod.level2, (cd.level2)::bigint) = cpd.level2))
-> Hash Full Join (cost=148.09..291.54 rows=42 width=2446) (actual time=0.076..0.093 rows=28 loops=1)
Hash Cond: ((cd.level1 = cod.level1) AND (cd.level2 = cod.level2))
-> Nested Loop Left Join (cost=0.42..143.74 rows=17 width=2342) (actual time=0.004..0.015 rows=17 loops=1)
-> CTE Scan on company_data cd (cost=0.00..0.34 rows=17 width=2092) (actual time=0.000..0.001 rows=17 loops=1)
-> Index Scan using verification_info_by_company_id_idx on verification_info vi (cost=0.42..8.44 rows=1 width=250) (actual time=0.001..0.001 rows=0 loops=17)
Index Cond: (company_id = cd.company_id)
-> Hash (cost=147.04..147.04 rows=42 width=104) (actual time=0.049..0.050 rows=28 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Subquery Scan on cod (cost=145.70..147.04 rows=42 width=104) (actual time=0.033..0.043 rows=28 loops=1)
-> WindowAgg (cost=145.70..146.62 rows=42 width=120) (actual time=0.033..0.041 rows=28 loops=1)
-> Sort (cost=145.67..145.78 rows=42 width=96) (actual time=0.032..0.033 rows=28 loops=1)
Sort Key: cd_1.level1, co.name, co.office_id
Sort Method: quicksort Memory: 28kB
-> Nested Loop (cost=0.43..144.54 rows=42 width=96) (actual time=0.005..0.022 rows=28 loops=1)
-> CTE Scan on company_data cd_1 (cost=0.00..0.34 rows=17 width=24) (actual time=0.000..0.001 rows=17 loops=1)
-> Index Scan using company_office_by_company_id_idx on company_office co (cost=0.43..8.46 rows=2 width=104) (actual time=0.001..0.001 rows=2 loops=17)
Index Cond: (company_id = cd_1.company_id)
-> Hash (cost=269.04..269.04 rows=162 width=1092) (actual time=0.308..0.308 rows=74 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Subquery Scan on cpd (cost=263.78..269.04 rows=162 width=1092) (actual time=0.275..0.296 rows=74 loops=1)
-> WindowAgg (cost=263.78..267.42 rows=162 width=1108) (actual time=0.274..0.291 rows=74 loops=1)
-> Sort (cost=263.78..264.18 rows=162 width=1100) (actual time=0.273..0.275 rows=74 loops=1)
Sort Key: (COALESCE(cpd_1.level1, cpd_2.level1)), (COALESCE(cpd_1.level2, cpd_2.level2)), (COALESCE((row_number() OVER (?)), '1'::bigint))
Sort Method: quicksort Memory: 31kB
-> Hash Full Join (cost=8.57..257.83 rows=162 width=1100) (actual time=0.206..0.257 rows=74 loops=1)
Hash Cond: ((cpd_2.level1 = cpd_1.level1) AND (cpd_2.level2 = cpd_1.level2) AND ((row_number() OVER (?)) = cpd_1.level3))
-> WindowAgg (cost=6.59..252.39 rows=162 width=68) (actual time=0.069..0.108 rows=74 loops=1)
-> Incremental Sort (cost=5.06..248.75 rows=162 width=44) (actual time=0.067..0.090 rows=74 loops=1)
Sort Key: cpd_2.level1, cpd_2.level2, cd_2.contact_type, cd_2.value
Presorted Key: cpd_2.level1
Full-sort Groups: 3 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB
-> Nested Loop (cost=0.43..244.36 rows=162 width=44) (actual time=0.035..0.073 rows=74 loops=1)
-> CTE Scan on contact_person_data cpd_2 (cost=0.00..1.06 rows=53 width=32) (actual time=0.000..0.002 rows=42 loops=1)
-> Index Only Scan using contact_detail_pkey on contact_detail cd_2 (cost=0.43..4.56 rows=3 width=44) (actual time=0.001..0.001 rows=2 loops=42)
Index Cond: (contact_person_id = cpd_2.contact_person_id)
Heap Fetches: 0
-> Hash (cost=1.06..1.06 rows=53 width=1068) (actual time=0.126..0.126 rows=42 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> CTE Scan on contact_person_data cpd_1 (cost=0.00..1.06 rows=53 width=1068) (actual time=0.093..0.117 rows=42 loops=1)
Planning Time: 1.013 ms
Execution Time: 0.629 ms
All index scans are with conditions.
So hypothesis that join conditions are propagated to back to subselects is not true on PostgreSQL 17.6. This causes much higher time in the query execution. My query cost is 797.71, Spring Data Relational query cost is 377969.47. That is about 500 times higher cost in Spring Data Relational. I suggest to do performance tests on the large data sets, this problem is hardly visible when the tables contain only hundreds of rows.
Hi! I've reviewed this issue and I'm interested in working on it.
I can see the problem with the query generation for multiple one-to-many relationships and understand the CTE-based solution you've proposed. Would it be okay if I take on this issue? I'd like to start by reproducing the problem locally and then work on implementing the fix. If it's okay to proceed, could you let me know which part of the codebase handles the query generation?
Thanks!
A PR would be appreciated. But be warned, I don't consider this a good first issue, since the code generation is rather complex.
But if you still want to give it a try go for it.
The sql generation for a single query loading happens in SingleQuerySqlGenerator.
Remember to make sure all integration tests work with all the databases by running
./mvnw clean verify -Dall-dbs
Just to note, I've noticed the performance part of the problem on PostgreSQL when I run tests on a relatively large database (1 million of root records and 2 times of that of the dependent objects). On such database the difference between queries is very big. On small database with few records, PostgreSQL is quick even with full index scan.
A first PR should fix the bug. If it improves performance that would be great. But correct goes before fast.
Thank you @schauder for the guidance and warning about the complexity - I'm ready to tackle this challenge!
I'll start by reproducing the issue locally, studying the SingleQuerySqlGenerator code, and running the full test suite with ./mvnw clean verify -Dall-dbs.
@const, your performance insights with large datasets are very helpful context. I'll focus on correctness first as suggested, then work on a PR.
Let me get started and I'll reach out if I run into any blockers!
@lsh1215, I published a dataset generator for my sample database structure when I was investigating other issue that was actually an Intel CPU bug: https://github.com/const/pg-json-uuid-bug/tree/main/src/main/java/sample/prepare
You could tweak it if you are going to do some performance tests.
@const Thanks for sharing! That'll be really helpful for testing.
There is another related bug.
If RCompany from test is replaced by the following class:
package sample.model;
import lombok.Getter;
import lombok.Setter;
import lombok.ToString;
import org.springframework.data.annotation.Id;
import org.springframework.data.relational.core.mapping.Column;
import org.springframework.data.relational.core.mapping.MappedCollection;
import org.springframework.data.relational.core.mapping.Table;
import java.util.*;
@Getter
@Setter
@ToString
@Table(name = "company")
public class RCompany {
@Id
@Column("company_id")
private UUID id;
@Column("name")
private String name;
private String url;
private String industry;
private String description;
@MappedCollection(idColumn = "company_id", keyColumn = "company_id")
private List<RContactPerson> contactPersons;
@MappedCollection(idColumn = "company_id", keyColumn = "company_id")
private List<ROffice> offices;
}
There is a Set is changed to List and no other changes.
In that case there is an exception:
java.lang.ClassCastException: class java.util.UUID cannot be cast to class java.lang.Number (java.util.UUID and java.lang.Number are in module java.base of loader 'bootstrap')
at org.springframework.data.jdbc.core.convert.RowDocumentExtractorSupport$ListContainer.add(RowDocumentExtractorSupport.java:479)
at org.springframework.data.jdbc.core.convert.RowDocumentExtractorSupport$ContainerSink.getResult(RowDocumentExtractorSupport.java:437)
at org.springframework.data.jdbc.core.convert.RowDocumentExtractorSupport$RowDocumentSink.lambda$getResult$0(RowDocumentExtractorSupport.java:302)
at java.base/java.util.LinkedHashMap.forEach(LinkedHashMap.java:986)
at org.springframework.data.jdbc.core.convert.RowDocumentExtractorSupport$RowDocumentSink.getResult(RowDocumentExtractorSupport.java:299)
at org.springframework.data.jdbc.core.convert.RowDocumentResultSetExtractor$RowDocumentIterator.next(RowDocumentResultSetExtractor.java:281)
at org.springframework.data.jdbc.core.convert.RowDocumentResultSetExtractor$RowDocumentIterator.next(RowDocumentResultSetExtractor.java:185)
at org.springframework.data.jdbc.core.convert.AggregateReader.extractAll(AggregateReader.java:196)
at org.springframework.data.jdbc.core.convert.AggregateReader.lambda$findAll$2(AggregateReader.java:157)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:733)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:658)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:723)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:748)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.query(NamedParameterJdbcTemplate.java:178)
at org.springframework.data.jdbc.core.convert.AggregateReader.doFind(AggregateReader.java:167)
at org.springframework.data.jdbc.core.convert.AggregateReader.findAll(AggregateReader.java:157)
at org.springframework.data.jdbc.core.convert.AggregateReader.findAllById(AggregateReader.java:132)
at org.springframework.data.jdbc.core.convert.SingleQueryDataAccessStrategy.findAllById(SingleQueryDataAccessStrategy.java:68)
at org.springframework.data.jdbc.core.convert.SingleQueryFallbackDataAccessStrategy.findAllById(SingleQueryFallbackDataAccessStrategy.java:85)
at org.springframework.data.jdbc.core.JdbcAggregateTemplate.findAllById(JdbcAggregateTemplate.java:358)
at org.springframework.data.jdbc.repository.support.SimpleJdbcRepository.findAllById(SimpleJdbcRepository.java:95)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:359)
at org.springframework.data.repository.core.support.RepositoryMethodInvoker$RepositoryFragmentMethodInvoker.lambda$new$0(RepositoryMethodInvoker.java:277)
at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:170)
at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:158)
at org.springframework.data.repository.core.support.RepositoryComposition$RepositoryFragments.invoke(RepositoryComposition.java:515)
at org.springframework.data.repository.core.support.RepositoryComposition.invoke(RepositoryComposition.java:284)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$ImplementationMethodExecutionInterceptor.invoke(RepositoryFactorySupport.java:734)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:174)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:149)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:380)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:138)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:223)
at jdk.proxy2/jdk.proxy2.$Proxy87.findAllById(Unknown Source)
at sample.SampleTest.shouldHaveNonNullOfficeNames(SampleTest.java:83)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
The place where it fails is also has a suspicious fragment in org.springframework.data.jdbc.core.convert.RowDocumentExtractorSupport.ListContainer.
// TODO: Are we 0 or 1 based?
private static class ListContainer extends CollectionContainer {
private final Map<Number, Object> list = new TreeMap<>(Comparator.comparing(Number::longValue));
@Override
public void add(Object key, @Nullable Object value) {
list.put(((Number) key).intValue() - 1, value);
}
// skipped
}
Note, that the comparator converts value to longValue(), but keys are put as intValue(). I think that there is inconsistency here.