Support parallel DuckDB threads for Postgres table scan
Currently, we use a single DuckDB thread for Postgres table scan, even though multiple Postgres workers will be initialized. This leads to a performance bottleneck when scanning large amounts of data.
This PR parallelizes the conversion from Postgres tuple to DuckDB data chunk. Below are benchmark results on a 5GB TPCH lineitem table.
- Benchmark query:
select * from lineitem order by 1 limit 1 - Other GUC setups:
duckdb.max_workers_per_postgres_scan= 2
Threads (duckdb.threads_for_postgres_scan) |
Costs (seconds) |
|---|---|
| 1 | 15.8 |
| 2 | 8.7 |
| 4 | 5.8 |
@JelteF Thanks for the review! YES, go for 1.1.0 is reasonable.
Do you plan on addressing the review feedback, I'm considering maybe merging this in for 1.0 anyway if it's in a good state.
Yeah, that would be nice! Let me resolve the conflict first.
@JelteF Hey, restrictions on unsafe types like JSON/LIST have been removed by converting Postgres slots into DuckDB data chunks in a columnar fashion. If any other unsafe type is supported in the future, one only needs to add it to IsThreadSafeTypeForPostgresToDuckDB.
btw, the columnar conversion can be optimized by eliminating if-else branch (also switch statement). This may involve a large amount of code refactoring.