tap-google-sheets
tap-google-sheets copied to clipboard
Reloads Entire Sheet Each Load
After setting up the integration with Stitch, I have found that every time the data is loaded into the warehouse, every single row is re-imported during the process. This is going to quickly increase the monthly row count for the integration. Is there a way to add only the new rows during each import?
After doing some preliminary research there are 3 realistic options because unfortunately google sheets api does not provide any way to know which rows / cells have been updated (there's no metadata on what timestamp it was updated):
- Stitch adds an onEdit trigger to add a Note to the corresponding cell indicating a modified timestamp or similar metadata. It would also require appropriate permissions and good documentation for customers to explain this. It's more work to develop but it's my favorite customer experience.
- Saving an old version of the sheet when sync is done in the state and comparing the old and new google sheets to create a diff and then only emitting updated rows. If space is an issue then saving just a hash per row is also an option.
- Providing the customer an option to set a custom replication key and requiring work on their part to include in that column a valid timestamp when the row was updated. This is the least amount of development work, but puts a burden on the user.
The work needs to be prioritized either way.