tap-jira icon indicating copy to clipboard operation
tap-jira copied to clipboard

Incremental sync flaw

Open gnilrets opened this issue 1 year ago • 2 comments

I'm running this tap through meltano. I've been running since March 2023 and have noticed that it occasionally misses records. If I do a full re-sync, the records show up, but something about the state is not working properly. Have you seen this before?

gnilrets avatar Jun 20 '23 22:06 gnilrets

I'm pretty sure I discovered the issue here is due to a flaw with Jira pagination. When we run a query fetching all issues since the last update, we might get too many records for Jira to return at once, so it returns the first N and then indicates that there are M more records. However, it doesn't look like Jira has a way to cache the query results. So when you re-run the API call and request records starting with the N+1 record, it's possible that the results may have been updated.

For example, suppose I have 5 issues (ISSUE-1 through ISSUE-5) and we want to return all of them ordered by the updated timestamp, but we have maxResults: 3. For the first query, I submit a request with startAt: 0 and get

key updated
ISSUE-1 2022-06-27 00:00:00
ISSUE-2 2022-06-27 00:00:01
ISSUE-3 2022-06-27 00:00:02

The API will return that isLast: False and total: 5. I then submit a request with startAt: 3 and would expect to get the next two issues. However, right before this second request is made, ISSUE-2 is updated. This would shift ISSUE-4 into the first 3 records and ISSUE-2 would show up in the next set.

key updated
ISSUE-5 2022-06-27 00:00:04
ISSUE-2 2022-06-27 01:23:01

The consequence is that ISSUE-4 is never seen in the paginated results.

To fix this, I believe we'd have to stop using the Paginator and instead run subsequent queries with the maximum updated timestamp of the previous query.

gnilrets avatar Jun 27 '23 23:06 gnilrets

Using the maximum updated timestamp doesn't work either since JQL is limited to querying by the minute and it would be easy to get more than a page of data where all the records were updated in the same minute. Instead, we can query the data in descending order.

gnilrets avatar Jun 28 '23 23:06 gnilrets