gpdb
gpdb copied to clipboard
Support parallel index creation on GPDB
trafficstars
To support parallel index creation for pgvector, GPDB needs to support bgworker parallel scan.
- gp_session_id, dtxContextInfo, and numsegmentsFromQD should be passed to parallel workers
- distributedSnapshot should be initialized by setupQEDtxContext() on parallel workers
- cdb_setup() should be skipped, because, for parallel workers, MyProcPort is uninitialized in initPostgres() and they do not need to initialize the cdb link and motion layer.
Meanwhile, one bug is fixed: the snapshot is uninitialized after allocation in ParallelWorkerMain()
asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false);
tsnapspace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT, true);
asnapshot = RestoreSnapshot(asnapspace);
tsnapshot = tsnapspace ? RestoreSnapshot(tsnapspace) : asnapshot;
RestoreTransactionSnapshot(tsnapshot,
fps->parallel_leader_pgproc);
The snapshot will be used in RestoreTransactionSnapshot(), and the field
haveDistribSnapshot of Snapshot will be used without initialization.
To fix this bug:
haveDistribSnapshotshould be stored in serialized_snapshot when serializing snapshots.distribSnapshotWithLocalMappingshould also be stored in serialized_snapshot, because it is required whenhaveDistribSnapshot== true.
Used for team discussion.