dataall icon indicating copy to clipboard operation
dataall copied to clipboard

In Dataset Crawler status should be displayed

Open sandeephs1 opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Once the user creates the crawler under dataset, status of the crawler is unknown and user might be forced to trigger it again and again which keeps on restarting the crawler

Describe the solution you'd like Once the crawler is triggered, either start crawler button to be disabled or status of the crawler should be displayed

Describe alternatives you've considered Once the crawler is triggered, either start crawler button to be disabled or status of the crawler should be displayed

Additional context Add any other context or screenshots about the feature request here.

P.S. Please Don't attach files. Add code snippets directly in the message body instead.

sandeephs1 avatar Jan 28 '24 15:01 sandeephs1

Hi @sandeephs1, thanks for opening an issue, definitely a feature of interest. It might look like an easy enhancement, but it is a bit trickier than it looks like. There are 2 alternatives, polling the status or pushing the status.

  • polling: data.all backend retrieves Glue Crawler status. Polling happens constantly at defined periods of time.
  • pushing: a certain event (GlueCrawlerX) triggers an action in data.all backend, that updates info in UI.

Polling: developing a function that retrieves the Glue Crawler status is relatively easy, the problem is that in order to have the most updated information, we need to execute the function constantly. It is doable, but we just need to be careful to not overload Glue service quotas throttling and to minimize the amount of boto3 calls

Pushing: definitely a more elegant solution, but requires additional development. Communication in data.all always starts from the central account backend to the environment accounts. This feature would implement communication from environment to backend. We could implement this pattern using AWS CloudTrail and an event bus as shown by the proposal of @SofiaSazonova in https://github.com/data-dot-all/dataall/issues/922 (at the end of the issue)

We are happy to explore both options, let us know what level of importance this feature has for you and we will try to prioritize accordingly. Bests

dlpzx avatar Feb 05 '24 12:02 dlpzx