postgres icon indicating copy to clipboard operation
postgres copied to clipboard

Make PostgreSQL multi-threaded

Open hlinnaka opened this issue 2 years ago • 3 comments

I'm using this issue to keep track of things that we need to do to make PostgreSQL multi-threaded. To be updated as we go.

Discussion on pgsql-hackers: https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi

PG Wiki page: https://wiki.postgresql.org/wiki/Multithreading (mostly from pgconf.dev 2024)

Prior art: Konstantin's old branch: https://github.com/postgrespro/postgresql.pthreads

Some very preliminary hacking on: https://github.com/hlinnaka/postgres/tree/threading. I used similar approach to labeling all global variables as Konstantin.

see also: https://github.com/cmu-db/peloton/wiki/Postgres-Modifications

TODOs:

  • [ ] label all global variables with markers like 'session_local', 'postmaster_guc' etc. to mark what they are used for.
  • [ ] have a tool that checks that all global variables have been labelled
  • [ ] extension support. Add something to control file to label extensions that can be run in multi-threaded mode or not
  • [ ] lots more, add tasks here later

Global variables

We have a lot of global and static variables:

$ objdump -t bin/postgres | grep -e ".data" -e ".bss" | grep -v "data.rel.ro" | wc -l 1666

Some of them are pointers to shared memory structures and can stay as they are. But many of them are per-connection state. The most straightforward conversion for those is to turn them into thread-local variables, like Konstantin did in [0].

It might be good to have some kind of a Session context struct that we pass everywhere, or maybe have a single thread-local variable to hold it. Many of the global variables would become fields in the Session. But that's future work.

Extensions

A lot of extensions also contain global variables or other things that break in a multi-threaded environment. We need a way to label extensions that support multi-threading. And in the future, also extensions that require a multi-threaded server.

Let's add flags to the control file to mark if the extension is thread-safe and/or process-safe. If you try to load an extension that's not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection startup and shutdown. And background worker API probably needs some changes.

Exposed PIDs

We expose backend process PIDs to users in a few places. pg_stat_activity.pid and pg_terminate_backend(), for example. They need to be replaced, or we can assign a fake PID to each connection when running in multi-threaded mode.

Signals

We use signals for communication between backends. SIGURG in latches, and SIGUSR1 in procsignal, for example. Those primitives need to be rewritten with some other signalling mechanism in multi-threaded mode. In principle, it's possible to set per-thread signal handlers, and send a signal to a particular thread (pthread_kill), but I think it's better to just rewrite them.

We also document that you can send SIGINT, SIGTERM or SIGHUP to an individual backend process. I think we need to deprecate that, and maybe come up with some convenient replacement. E.g. send a message with backend ID to a unix domain socket, and a new pg_kill executable to send those messages.

Restart on crash

If a backend process crashes, postmaster terminates all other backends and restarts the system. That's hard (impossible?) to do safely if everything runs in one process. We can continue have a separate postmaster process that just monitors the main process and restarts it on crash.

Thread-safe libraries

Need to switch to thread-safe versions of library functions, e.g. uselocale() instead of setlocale().

The Python interpreter has a Global Interpreter Lock. It's not possible to create two completely independent Python interpreters in the same process, there will be some lock contention on the GIL. Fortunately, the python community just accepted https://peps.python.org/pep-0684/. That's exactly what we need: it makes it possible for separate interpreters to have their own GILs. It's not clear to me if that's in Python 3.12 already, or under development for some future version, but by the time we make the switch in Postgres, there probably will be a solution in cpython.

At a quick glance, I think perl and TCL are fine, you can have multiple interpreters in one process. Need to check any other libraries we use.

hlinnaka avatar Jul 19 '23 15:07 hlinnaka

Current status: no plans to put any serious efforts into it.

Benefit for Neon: this will help Autoscaling (more connections, better memory management, ...)

andreasscherbaum avatar Apr 24 '24 16:04 andreasscherbaum

Tasks:

  • [ ] Timeline for implementation and switchover
  • [ ] Timeline for "patches must be ready in order to go into the next release"
  • [ ] Identify potential reviewers
  • [ ] Identify must-have extensions, contact authors

andreasscherbaum avatar Jun 02 '24 21:06 andreasscherbaum

I will work with Peter Eisentraut on my uselocale patch, fyi.

tristan957 avatar Jun 03 '24 17:06 tristan957