[Feature][Jdbc] Support read multiple tables by regular expressions
…ssions in PostgreSQL/mysql/oracle/sqlserver, and fix the null pointer issue in the regular expression https://github.com/apache/seatunnel/issues/9209
Purpose of this pull request
added contents about multiple tables and regular expressions in PostgreSQL/mysql/oracle/sqlserver, and fix the null pointer issue in the regular expression.
Does this PR introduce any user-facing change?
Yes. This PR enhances the table_list parameter to directly support regex patterns for table filtering, while maintaining backward compatibility. Here's the detailed breakdown:
New Feature: Direct Use of Regular Expressions in table_path Purpose: Allow users to write regular expressions directly in the table_path field within table_list to filter tables. Example Configuration: "table_list"=[ { "table_path"="TEST.TEST_DB_*" # Matches all tables with the "TEST_DB" prefix } ] This configuration matches all tables prefixed with TEST_DB (e.g., TEST_DB_2023, TEST_DB_2024).
Improvement and Fix: Enhanced the robustness of the approximateRowCntStatement method in OracleDialect. Fixed a null pointer error in Oracle when executing queries with empty or invalid parameters. Key Notes: The table_path now supports regex syntax (e.g., TEST.TEST_DB_* → matches all tables under the TEST schema with names starting with TEST_DB_). The Oracle fix ensures stable execution of row count estimation logic, avoiding crashes due to unhandled edge cases.
How was this patch tested?
Testing Environment OS: Linux (Ubuntu 20.04) SeaTunnel Version: 2.3.9 Execution Mode: Flink on YARN (yarn-application) Databases: PostgreSQL 16.0(Source) Iceberg (Sink)
Test Configuration: Created a configuration file pg2iceberg.conf to read tables matching the regex TEST.TEST_DB_* from PostgreSQL: { env { execution.parallelism = 1 job.mode = "BATCH" job.name = "seatunnel_batch_job" }
source配置 source { JDBC { url = "jdbc:postgresql://xxxxxxx:xxxxx/xxxxx" driver = "org.postgresql.Driver" user = "xxxxxxxx" password = "xxxxxxx" "table_list" = [ { "table_path" = "postgres.public.test_db_2.*" } ] split.size = 5000 fetch_size = 2000 } }
sink配置 sink { Iceberg { ........ } } }
Execution Command: ./bin/start-seatunnel-flink-15-connector-v2.sh --config pg2iceberg.conf --deploy-mode run-application --target yarn-application --name multitable_pg2Iceberg
Check log: Filtering tables with regex pattern: postgres.public.test_db_2.* Found regex match table: postgres.public.test_db_20 Found regex match table: postgres.public.test_db_21 Found regex match table: postgres.public.test_db_22 Found regex match table: postgres.public.test_db_20250324 Found regex match table: postgres.public.test_db_202502 Found regex match table: postgres.public.test_db_202501 Total tables matched after filtering: 6
Check list
- [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
- [ * ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
- [ * ] If you are contributing the connector code, please check that the following files are updated:
- Update plugin-mapping.properties and add new connector information in it
- Update the pom file of seatunnel-dist
- Add ci label in label-scope-conf
- Add e2e testcase in seatunnel-e2e
- Update connector plugin_config
I think we need to add a option to mark whether the TablePath is a regular expression.
1、Simplified Path Matching Design,If you enter a path like /project/table_123, the system first checks for this exact path,If no exact match exists, it automatically treats your input as a regex pattern. The main advantages are as follows:
- Zero configuration: There is no need for marking parameters like "is_regex", and users don't have to switch modes manually. Just enter the input and it will be matched, reducing the cognitive burden.
- Intuitive behavior: It follows a logic similar to that of the grep tool, First, perform a literal match, and then enable pattern matching.
2、Like Flink CDC (where you directly use exact table names OR regex patterns without special parameters), we keep things simple.
The modification has been made. I hope you can review it when you are free.
good job. This is already close to merge. Please add e2e
good job. This is already close to merge. Please add e2e
E2E tests for MySQL and Oracle have been added.