Panora icon indicating copy to clipboard operation
Panora copied to clipboard

Feat: Incremental Sync for providers

Open mit-27 opened this issue 1 year ago • 0 comments

Problem:

  • When a sync is triggered, data is fetched from the start for each provider, which can be time-consuming and results in retrieving data that we already possess.
  • Also, several providers require a pagination to pull all the data from the source to Panora db.

Possible Solution:

  • Currently, we have a connection associated with every provider. So, we could preserve the state of every object of every provider's connection by storing them in a separate table.
  • Each provider uses a different pagination method, so to address this and manage the state of the last sync data, I’ve created the following table. It might be improved but it represents the general idea.
-- ************************************** vertical_objects_sync_track_data
CREATE TABLE vertical_objects_sync_track_data
(
 id_vertical_objects_sync_track_data uuid NOT NULL,
 vertical                            text NOT NULL,
 provider_slug                       text NOT NULL,
 object                              text NOT NULL,
 pagination_type                     text NOT NULL,
 id_connection                       uuid NOT NULL,
 data                                json,
 CONSTRAINT PK_vertical_objects_sync_track_data PRIMARY KEY ( id_vertical_objects_sync_track_data ),
 CONSTRAINT FK_connection FOREIGN KEY ( id_connection ) REFERENCES connections ( id_connection )
);
  • By implementing this logic, we could preserve the sync state for each object, and storing the generic type of pagination data could help to handle any pagination.

Note:

  • I have implemented this logic for attio as an example to give you the idea.

mit-27 avatar Sep 16 '24 23:09 mit-27