tap-freshdesk icon indicating copy to clipboard operation
tap-freshdesk copied to clipboard

Error iterating pages

Open mcouto-sossego opened this issue 6 years ago • 4 comments

When iterating pages, some tickets may be missing due to dynamic nature of freshdesk dataset.

Since each page call may be separated several minutes from last page, tickets may be updated by agents and results may differ.

Example:

page1 = ticket1, ticket2, ticket3, ..., ticket 98, ticket99, ticket100 (wait 3 minutes) page2 = ticket103, ticket104, ticket105, ...

In the above example, tickets 101, 102 and 103 are missed if 3 random tickets from 1 to 100 are updated in the proposed 3 minutes window.

Resolution: all pages data must be downloaded at once, before iterating in conversations and other domains.

The same apply to conversation (paged data), etc.

Source: tap_freshdesk/init.py Function: gen_request Proposition: make all requests in the "while loop", without any "yield", just append data to a temp var. Only yield rows from the temp var after the loop.

mcouto-sossego avatar Jan 16 '19 19:01 mcouto-sossego

I don't think this strategy would work so well in practice due to the memory usage pattern this proposal would impose.

Is there an alternative way to query this data using a min / max combination? A feature like that would allow us to impose a "window" on the data we paginate and only move the window after the iteration has completed.

KAllan357 avatar Jan 16 '19 20:01 KAllan357

There is no option like that on Freshdesk API ( https://developers.freshdesk.com/api/#list_all_tickets )

We are using tap-freshdesk, and by debugging logs we detect that 3-10 tickets are missing on each 100 tickets single page. It is about 7% failure on an ETL proccess (acceptable must be zero).

mcouto-sossego avatar Jan 16 '19 22:01 mcouto-sossego

@mcouto-sossego Are you able to make a PR with your idea?

luandy64 avatar Jan 23 '19 16:01 luandy64

@mcouto-sossego were you able to find any workaround for this ?

We are also using tap-freshdesk with stichdata in our production and this behaviour (missing tickets while iterating pages) is significantly impacting the system.

dpnsh avatar Apr 16 '20 11:04 dpnsh