gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

[WIP] Liberate tablespaces from Greenplum dbid

Open asimrp opened this issue 4 years ago • 3 comments
trafficstars

A segment's dbid uniquely identifies it among all postmasters within a Greenplum cluster. This is primarily used by FTS. This was repurposed for user-defined tablespaces. Each segment would append a subdirectory named as its dbid to the user-specified tablespace location. This would distinguish data objects created by multiple semgents running on the same host.

We now use a randomly generated unique directory name, instead of a segment's dbid, for this purpose. A new interface, create_unique_directory() is added to pgcommon library, for use by backend code and frontend tools such as pg_basebackup and pg_rewind.

In backend, at the time of tablespace creation, two unique subdirectory names are generated, one for primary and another for mirror. The mirror, when replaying tablespace creation, simply uses the unique name from WAL. The advantage is replay operaiton is simple. Another option (not implemented in this patch yet) is to let mirror also generate a unique name and detect conflicts based on its filesystem.

The benefit of not using dbid is it brings tablespace implementation in Greenplum a bit more aligned with that in PostgreSQL. Third-party extensions written for PostgreSQL no longer need to worry about dbid.

TODO: 1. refactor pg_rewind.c to use create_unique_directory() 2. run relevant tests

Here are some reminders before you submit the pull request

  • [ ] Add tests for the change
  • [ ] Document changes
  • [ ] Communicate in the mailing list if needed
  • [ ] Pass make installcheck
  • [ ] Review a PR in return to support the community

asimrp avatar Apr 12 '21 13:04 asimrp

@ashwinstar I've changed the patch so that create-tablespace WAL record is not modified. The segment-specific subdir name continues to be a randomly generated string but we now create a symlink named gp_segment that points to the owning segment's data directory.

asimrp avatar Apr 15 '21 12:04 asimrp

@kalensk raised a question about how this change impacts upgrade. The intention is that this change should not impact upgrade. Let’s consider a 6 to 7 upgrade scenario where user defines table spaces exist on 6 prior to upgrade. Prior to upgrade, the tablespace layout should look like the following from a segment:

<segment data dir>/pg_tblspc/<oid> links to —> <user specified location>/<dbid>

And there is a GP version dir as follows: <user specified location>/<dbid>/<GP_VERSION>

Given this layout, how will it change after upgrade, with and without this patch?

asimrp avatar Apr 15 '21 14:04 asimrp

In general yes we do wish to get away from usage of dbid in tablespace paths, mainly due to

  • usage of it required us to modify pg_basebackup and pg_rewind kind of tools to pass dbid
  • also, due to this requirement have to register mirror in catalog first to generate the dbid before creating the mirror
  • need to modify postgresql.conf file on mirror creation from primary to update the dbid

So will be helpful to continue exploring on the solutions to get rid of the same.

ashwinstar avatar Sep 02 '22 22:09 ashwinstar

Closing as this enhancement is now tracked under https://github.com/greenplum-db/gpdb/issues/14338

ashwinstar avatar Oct 25 '22 23:10 ashwinstar