starbasedb icon indicating copy to clipboard operation
starbasedb copied to clipboard

feat: Add Data Sync Plugin for external database synchronization

Open onyedikachi-david opened this issue 11 months ago • 5 comments

Purpose

Fixes: #72 /claim #72 Implement data synchronization functionality between external data sources and StarbaseDB's internal SQLite database. This plugin enables close-to-edge replica creation by automatically pulling and synchronizing data from external sources (like PostgreSQL) at configurable intervals.

Tasks

  • [x] Implement core Data Sync Plugin

    • [x] Configurable sync intervals via sync_interval option
    • [x] Selective table synchronization via tables option
    • [x] Incremental updates using created_at and id columns
    • [x] PostgreSQL to SQLite type mapping
    • [x] Metadata tracking for sync state
  • [x] Add plugin configuration support

    • [x] Wrangler.toml configuration options
    • [x] Environment variable support for sensitive data
    • [x] Plugin registration system integration
  • [x] Implement sync mechanisms

    • [x] Pull-based synchronization
    • [x] Schema auto-detection and mapping
    • [x] Incremental data fetching
    • [x] Error handling and retry logic
  • [x] Create comprehensive demo and documentation

    • [x] Testing environment with Docker PostgreSQL
    • [x] Example endpoints for monitoring
    • [x] Query hook demonstrations
    • [x] Security best practices

Verify

  1. Start a PostgreSQL instance:
docker run --name starbasedb-postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=demo -p 5432:5432 -d postgres:15
  1. Configure the plugin in wrangler.toml:
[plugins.data-sync]
sync_interval = 300 # 5 minutes
tables = ["users", "products"]
  1. Set up environment variables for database credentials

  2. Test synchronization:

  • Monitor sync status: curl http://localhost:8787/sync-status
  • View synced data: curl http://localhost:8787/sync-data
  • Check sync metadata: curl http://localhost:8787/debug

Before

StarbaseDB instances could only:

  • Use internal SQLite database
  • Connect to external databases for direct queries
  • No automatic data synchronization
  • No edge-replica capabilities

After

image image

StarbaseDB now supports:

  • Automatic data synchronization from external sources
  • Configurable sync intervals and table selection
  • Incremental updates based on timestamps and IDs
  • Type mapping between PostgreSQL and SQLite
  • Close-to-edge replica functionality
  • Monitoring and debugging capabilities

onyedikachi-david avatar Jan 24 '25 23:01 onyedikachi-david

Put a few thoughts on the pull request. I still need to run through the demo and test, just starting with a code review.

One question I have: is there any way to abstract the Postgres specific code out so we could easily support other types of databases easily in the future? For example the column mappings between Postgres <> SQLite would obviously only work if the user connects a Postgres database to it, and would break likely for MySQL. Maybe there is a high level plugin like DataSyncPlugin and then another plugin you can pass inside of it that can override functions that would be database specific like:

new DataSyncPlugin(syncSource: PostgresSync | MySQLSync | etc)

And maybe PostgresSync overrides some default class implementation and all the database specific logic (e.g. calling to the information_schema & column type mapping) could exist in there?

Thoughts?

It's best actually, let me see how to implement it.

onyedikachi-david avatar Jan 26 '25 17:01 onyedikachi-david

@Brayden

Refactored the data sync plugin to:

  • Abstract database-specific code into DatabaseSyncSource class
  • Add tmp_ prefix to all synced tables
  • Support custom schemas with proper table name mapping
  • Add flexible sync configuration (custom columns, batch sizes)
  • Improve type mapping and validation
  • Add comprehensive error handling and logging
  • Removed demo test; not working.

onyedikachi-david avatar Jan 27 '25 17:01 onyedikachi-david

@onyedikachi-david Will test this implementation this week and report back!

Brayden avatar Feb 03 '25 15:02 Brayden

I'm trying to test this and running into a few snags. Maybe @onyedikachi-david you can help check what I might be missing in my test here so I can get it working as expected.

wrangler.toml

[plugins.data-sync]
sync_interval = 300 # 5 minutes
tables = ["users", "person"]

#...

[vars]
# ...
EXTERNAL_DB_TYPE = "postgresql"
EXTERNAL_DB_HOST = "*******.us-east-1.rds.amazonaws.com"
EXTERNAL_DB_PORT = 5432
EXTERNAL_DB_USER = "postgres"
EXTERNAL_DB_PASS = "password"
EXTERNAL_DB_DATABASE = "postgres"

./src/index.ts

const syncStore = new PostgresSyncSource({
    dialect: 'postgresql'
})

const plugins = [
    // ...
    new DataSyncPlugin(syncStore, { sync_interval: 60, tables: ["users", "person"]})
] satisfies StarbasePlugin[]

After I deploy I can see a new table is created for what I imagine is meant to be the users table but instead it's called users.table and I never see one get created for my person table. image

When I try to open that users.table to see if it has any contents I get this error from doing so:

Error: Error invoking remote method 'query': TypeError: Cannot read properties of undefined (reading 'columns')

Can you help share some steps to get it working where I can include it in my ./src/index.ts plugins array? I tried replicating what was in the ./plugins/postgres-sync/README.md file but seem to have been unsuccessful.

Brayden avatar Feb 09 '25 01:02 Brayden

I'll look into this ASAP.

onyedikachi-david avatar Feb 10 '25 18:02 onyedikachi-david