stripe-sync-engine icon indicating copy to clipboard operation
stripe-sync-engine copied to clipboard

RFC: loading existing data

Open kiwicopple opened this issue 4 years ago • 4 comments

It would be nice to be able to load existing Stripe data into a database. From a high level:

  1. Bulk fetch all Stripe data for a single entity (e.g. customer)
  2. Bulk insert/upsert data into the postgres table (e.g. stripe.customers)

Possible options

1. Don't do it

Just let the user download a CSV then COPY to their database

2. Upload CSV

Create an endpoint to POST a CSV, then the server can upload the data. eg:

// POST multipart /customers/load
// => { csv: file }
fastify.post('/', async function (req, reply) {

  const data = await req.csv()
  
  // upload data to Postgres database
  
  reply.send()
})

3. Fetch data from URL

Create an endpoint to fetch a CSV, then the server can upload the data. eg:

// PATCH /customers/load
// => { csv_url: file }
fastify.path('/', async function (req, reply) {

  // fetch file from URL
  // upload data to Postgres database
  
  reply.send()
})

4. Loop through Stripe API

Loop through all the stripe data via the Stripe API (but this could have rate-limiting limitations)

kiwicopple avatar May 01 '21 02:05 kiwicopple

Regarding #4 and rate limits... the best way to design this is to pull data in batches sorted on an entry timestamp (if that's available in the Stripe API -- I'd have to look it up.). But basically just pull all the data in batches and set a high watermark, then use the watermark to keep things in sync.

This could all be done in a Postgres function using the http extension, or using copy from program with any command-line tool that can talk to the Stripe API.

EDIT: Stripe has pagination available for doing large bulk loading: https://stripe.com/docs/api/pagination

burggraf avatar May 01 '21 03:05 burggraf

Could use singer but it looks like it has a few limitations:

(from https://github.com/datamill-co/target-postgres)

  • de-nests everything into separate tables (rather than using using a jsonb datatype)
  • the JSON Schema seems to map the timestamps to strings (I've been using numbers). Ideally I'd like to use numbers (to match stripe "exactly") and then generated columns to create timestamps

kiwicopple avatar May 04 '21 06:05 kiwicopple

I've thought about the best way to do this and I think the best "v1" would be:

  • Add a sync task to each object
  • this will fetch all the elements on stripe and loop through them -> calling the upsert function for each individually

This will be a bit slower than doing a full insert into the database, but it ensures data integrity by backfilling. We might improve this later by bulk uploading

kiwicopple avatar Jun 09 '21 05:06 kiwicopple

Option 4 is implemented here https://github.com/supabase/stripe-sync-engine/pull/45

phamhieu avatar Feb 14 '22 02:02 phamhieu