stripe-sync-engine
stripe-sync-engine copied to clipboard
RFC: loading existing data
It would be nice to be able to load existing Stripe data into a database. From a high level:
- Bulk fetch all Stripe data for a single entity (e.g.
customer) - Bulk insert/upsert data into the postgres table (e.g.
stripe.customers)
Possible options
1. Don't do it
Just let the user download a CSV then COPY to their database
2. Upload CSV
Create an endpoint to POST a CSV, then the server can upload the data. eg:
// POST multipart /customers/load
// => { csv: file }
fastify.post('/', async function (req, reply) {
const data = await req.csv()
// upload data to Postgres database
reply.send()
})
3. Fetch data from URL
Create an endpoint to fetch a CSV, then the server can upload the data. eg:
// PATCH /customers/load
// => { csv_url: file }
fastify.path('/', async function (req, reply) {
// fetch file from URL
// upload data to Postgres database
reply.send()
})
4. Loop through Stripe API
Loop through all the stripe data via the Stripe API (but this could have rate-limiting limitations)
Regarding #4 and rate limits... the best way to design this is to pull data in batches sorted on an entry timestamp (if that's available in the Stripe API -- I'd have to look it up.). But basically just pull all the data in batches and set a high watermark, then use the watermark to keep things in sync.
This could all be done in a Postgres function using the http extension, or using copy from program with any command-line tool that can talk to the Stripe API.
EDIT: Stripe has pagination available for doing large bulk loading: https://stripe.com/docs/api/pagination
Could use singer but it looks like it has a few limitations:
(from https://github.com/datamill-co/target-postgres)
- de-nests everything into separate tables (rather than using using a
jsonbdatatype) - the JSON Schema seems to map the timestamps to strings (I've been using numbers). Ideally I'd like to use numbers (to match stripe "exactly") and then generated columns to create timestamps
I've thought about the best way to do this and I think the best "v1" would be:
- Add a
synctask to each object - this will fetch all the elements on stripe and loop through them -> calling the
upsertfunction for each individually
This will be a bit slower than doing a full insert into the database, but it ensures data integrity by backfilling. We might improve this later by bulk uploading
Option 4 is implemented here https://github.com/supabase/stripe-sync-engine/pull/45