gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Support parallel index creation on GPDB

Open zhrt123 opened this issue 1 year ago • 0 comments
trafficstars

To support parallel index creation for pgvector, GPDB needs to support bgworker parallel scan.

  • gp_session_id, dtxContextInfo, and numsegmentsFromQD should be passed to parallel workers
  • distributedSnapshot should be initialized by setupQEDtxContext() on parallel workers
  • cdb_setup() should be skipped, because, for parallel workers, MyProcPort is uninitialized in initPostgres() and they do not need to initialize the cdb link and motion layer.

Meanwhile, one bug is fixed: the snapshot is uninitialized after allocation in ParallelWorkerMain()

	asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false);
	tsnapspace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT, true);
	asnapshot = RestoreSnapshot(asnapspace);
	tsnapshot = tsnapspace ? RestoreSnapshot(tsnapspace) : asnapshot;
	RestoreTransactionSnapshot(tsnapshot,
							   fps->parallel_leader_pgproc);

The snapshot will be used in RestoreTransactionSnapshot(), and the field haveDistribSnapshot of Snapshot will be used without initialization.

To fix this bug:

  • haveDistribSnapshot should be stored in serialized_snapshot when serializing snapshots.
  • distribSnapshotWithLocalMapping should also be stored in serialized_snapshot, because it is required when haveDistribSnapshot == true.

Used for team discussion.

zhrt123 avatar Mar 19 '24 09:03 zhrt123