gpdb
gpdb copied to clipboard
Consider MPP FDW LIMIT pushdown when both offset and limit clause are specified
What this PR does
Before this PR, GPDB won't push OFFSET/LIMIT in the remote servers for MPP FDW when both offset and limit clause are specified.
Because it's unsafe if we don't adjust Remote SQL.
In this PR, we consider MPP FDW OFFSET/LIMIT pushdown when both OFFSET and LIMIT clause are specified.
We will adjust Remote SQL when mpp_execute = 'all segments' for query including OFFSET/LIMIT clause.
Implementation Details
There are 3 cases for query including OFFSET/LIMIT clause.
Case 1: only LIMIT is specified
Two-phase LIMIT plan is enough and there is NO need to adjust Remote SQL.
This case had been supported. (Look at PR https://github.com/greenplum-db/gpdb/pull/16177 for more information.)
Case 2: both LIMIT and OFFSET are specified
We also need to adjust Remote SQL to let GPDB fetch tuples from 0 to limitOffset + limitCount from remote servers.
-- Simple query with OFFSET and LIMIT clause together is pushed down.
EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, c2 FROM mpp_ft2 order by c1 offset 2 limit 3;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit
Output: c1, c2
-> Gather Motion 2:1 (slice1; segments: 2)
Output: c1, c2
Merge Key: c1
-> Foreign Scan on public.mpp_ft2
Output: c1, c2
Remote SQL: SELECT c1, c2 FROM "MPP_S 1"."T 2" ORDER BY c1 ASC NULLS LAST LIMIT (2::bigint + 3::bigint)
Optimizer: Postgres-based planner
Case 3: only OFFSET is specified
In this case, we also need to adjust Remote SQL to let GPDB fetch all tuples of foreign table from remote servers.
-- Simple query with only OFFSET clause is NOT pushed down.
EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, c2 FROM mpp_ft2 order by c1 offset 998;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit
Output: c1, c2
-> Gather Motion 2:1 (slice1; segments: 2)
Output: c1, c2
Merge Key: c1
-> Foreign Scan on public.mpp_ft2
Output: c1, c2
Remote SQL: SELECT c1, c2 FROM "MPP_S 1"."T 2" ORDER BY c1 ASC NULLS LAST
Optimizer: Postgres-based planner
Pipeline: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb-dev-improve_mpp_offset_limit_pushdown-rocky8?group=all