pg_duckdb
pg_duckdb copied to clipboard
Harden against Postgres interactions with C++ destructors
While looking at the the C++ guidelines in the CONTRIBUTING.md file proposed in #90, I realized/remembered that Postgres C and regular C++ are not automatically the good friends that the project currently assumes they are.
Specifically if Postgres throws an ERROR, it will use sigsetjmp and longjump to traverse the the stack until the next PG_TRY. This is really problematic when considering the way C++ destructors work. Because the longjump will jump over these destructors. This can cause many different problems like memory leaks or locks acquired by lock_guard not being released.
The other way around is similarly bad. Any C++ exceptions that are not caught will skip over any cleanup that Postgres does in PG_CATCH. And thus probably killing the whole Postgres process tree.
This is all listed in the postgres docs as well: https://www.postgresql.org/docs/current/xfunc-c.html#EXTEND-CPP
The only way to fix these problems in a managable way I think, is by making the boundaries of calling C++ from C and C from C++ much clearer than they are now (possibly by never including both duckdb headers and postgres headers in the same files). And then at these boundaries convert C++ exceptions to Postgres errors and the other way around.
P.S. This same problem applies to Rust and was worked around by pgrx by implementing panic -> pgerror and pgerror -> panic conversion at every FFI boundary: https://github.com/pgcentralfoundation/pgrx/blob/develop/docs/src/design-decisions.md#protecting-the-rust-stack-read-lying-to-postgres
-bug as this does not have a specific bug report to fix, but instead is an approach to improve overall error handling. That said, we have agreed that this is a good idea and we want to move towards having all interactions between C and C++ be in a single file similar to how you would use FFI -- but we won't try to catch memory allocation errors.
I'm going to move this issue to 0.2.0, so we can start stability testing for 0.1.0. This is an important issue, but we fixed the worst cases. We'll fix more in the next release.