pg_duckdb Support parallel DuckDB threads for Postgres table scan

Currently, we use a single DuckDB thread for Postgres table scan, even though multiple Postgres workers will be initialized. This leads to a performance bottleneck when scanning large amounts of data.

This PR parallelizes the conversion from Postgres tuple to DuckDB data chunk. Below are benchmark results on a 5GB TPCH lineitem table.

Benchmark query: select * from lineitem order by 1 limit 1
Other GUC setups: duckdb.max_workers_per_postgres_scan = 2

Threads (`duckdb.threads_for_postgres_scan`)	Costs (seconds)
1	15.8
2	8.7
4	5.8

May 07 '25 07:05 YuweiXiao

@JelteF Thanks for the review! YES, go for 1.1.0 is reasonable.

May 07 '25 12:05 YuweiXiao

Do you plan on addressing the review feedback, I'm considering maybe merging this in for 1.0 anyway if it's in a good state.

May 30 '25 09:05 JelteF

Yeah, that would be nice! Let me resolve the conflict first.

May 30 '25 09:05 YuweiXiao

@JelteF Hey, restrictions on unsafe types like JSON/LIST have been removed by converting Postgres slots into DuckDB data chunks in a columnar fashion. If any other unsafe type is supported in the future, one only needs to add it to IsThreadSafeTypeForPostgresToDuckDB.

btw, the columnar conversion can be optimized by eliminating if-else branch (also switch statement). This may involve a large amount of code refactoring.

Jun 09 '25 07:06 YuweiXiao