feat: Add Data Sync Plugin for external database synchronization
Purpose
Fixes: #72 /claim #72 Implement data synchronization functionality between external data sources and StarbaseDB's internal SQLite database. This plugin enables close-to-edge replica creation by automatically pulling and synchronizing data from external sources (like PostgreSQL) at configurable intervals.
Tasks
-
[x] Implement core Data Sync Plugin
- [x] Configurable sync intervals via
sync_intervaloption - [x] Selective table synchronization via
tablesoption - [x] Incremental updates using
created_atandidcolumns - [x] PostgreSQL to SQLite type mapping
- [x] Metadata tracking for sync state
- [x] Configurable sync intervals via
-
[x] Add plugin configuration support
- [x] Wrangler.toml configuration options
- [x] Environment variable support for sensitive data
- [x] Plugin registration system integration
-
[x] Implement sync mechanisms
- [x] Pull-based synchronization
- [x] Schema auto-detection and mapping
- [x] Incremental data fetching
- [x] Error handling and retry logic
-
[x] Create comprehensive demo and documentation
- [x] Testing environment with Docker PostgreSQL
- [x] Example endpoints for monitoring
- [x] Query hook demonstrations
- [x] Security best practices
Verify
- Start a PostgreSQL instance:
docker run --name starbasedb-postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=demo -p 5432:5432 -d postgres:15
- Configure the plugin in
wrangler.toml:
[plugins.data-sync]
sync_interval = 300 # 5 minutes
tables = ["users", "products"]
-
Set up environment variables for database credentials
-
Test synchronization:
- Monitor sync status:
curl http://localhost:8787/sync-status - View synced data:
curl http://localhost:8787/sync-data - Check sync metadata:
curl http://localhost:8787/debug
Before
StarbaseDB instances could only:
- Use internal SQLite database
- Connect to external databases for direct queries
- No automatic data synchronization
- No edge-replica capabilities
After
StarbaseDB now supports:
- Automatic data synchronization from external sources
- Configurable sync intervals and table selection
- Incremental updates based on timestamps and IDs
- Type mapping between PostgreSQL and SQLite
- Close-to-edge replica functionality
- Monitoring and debugging capabilities
Put a few thoughts on the pull request. I still need to run through the demo and test, just starting with a code review.
One question I have: is there any way to abstract the Postgres specific code out so we could easily support other types of databases easily in the future? For example the column mappings between Postgres <> SQLite would obviously only work if the user connects a Postgres database to it, and would break likely for MySQL. Maybe there is a high level plugin like
DataSyncPluginand then another plugin you can pass inside of it that can override functions that would be database specific like:new DataSyncPlugin(syncSource: PostgresSync | MySQLSync | etc)And maybe
PostgresSyncoverrides some default class implementation and all the database specific logic (e.g. calling to the information_schema & column type mapping) could exist in there?Thoughts?
It's best actually, let me see how to implement it.
@Brayden
Refactored the data sync plugin to:
- Abstract database-specific code into DatabaseSyncSource class
- Add tmp_ prefix to all synced tables
- Support custom schemas with proper table name mapping
- Add flexible sync configuration (custom columns, batch sizes)
- Improve type mapping and validation
- Add comprehensive error handling and logging
- Removed demo test; not working.
@onyedikachi-david Will test this implementation this week and report back!
I'm trying to test this and running into a few snags. Maybe @onyedikachi-david you can help check what I might be missing in my test here so I can get it working as expected.
wrangler.toml
[plugins.data-sync]
sync_interval = 300 # 5 minutes
tables = ["users", "person"]
#...
[vars]
# ...
EXTERNAL_DB_TYPE = "postgresql"
EXTERNAL_DB_HOST = "*******.us-east-1.rds.amazonaws.com"
EXTERNAL_DB_PORT = 5432
EXTERNAL_DB_USER = "postgres"
EXTERNAL_DB_PASS = "password"
EXTERNAL_DB_DATABASE = "postgres"
./src/index.ts
const syncStore = new PostgresSyncSource({
dialect: 'postgresql'
})
const plugins = [
// ...
new DataSyncPlugin(syncStore, { sync_interval: 60, tables: ["users", "person"]})
] satisfies StarbasePlugin[]
After I deploy I can see a new table is created for what I imagine is meant to be the users table but instead it's called users.table and I never see one get created for my person table.
When I try to open that users.table to see if it has any contents I get this error from doing so:
Error: Error invoking remote method 'query': TypeError: Cannot read properties of undefined (reading 'columns')
Can you help share some steps to get it working where I can include it in my ./src/index.ts plugins array? I tried replicating what was in the ./plugins/postgres-sync/README.md file but seem to have been unsuccessful.
I'll look into this ASAP.