connectors icon indicating copy to clipboard operation
connectors copied to clipboard

"The job has not seen any update for some time." error is not helpful

Open ppf2 opened this issue 11 months ago • 3 comments

We need some better error handling in our connectors.

For example, the "The job has not seen any update for some time." is not helpful.

          "created_at": "2024-03-18T18:39:04.380Z",
          "deleted_document_count": 0,
          "error": "The job has not seen any update for some time.",
          "indexed_document_count": 16521,
          "indexed_document_volume": 173,
          "job_type": "full",
          "last_seen": "2024-03-18T18:59:06.729398+00:00",
          "metadata": {},
          "started_at": "2024-03-18T18:39:23.061424+00:00",
          "status": "error",
          "total_document_count": null,
          "trigger_method": "on_demand",

Meanwhile, the connectors logs are also not that helpful in diagnosing why this is happening:

image

My hunch is that this is happening because of a SIGTERM on the Enterprise Search instance running the connector service.

image

If so, we should handle this type of "expected" (intentional restarts of the connector service) or "unexpected" (unplanned termination of the connector service) failure more gracefully and provide intuitive error handling/logging so the user will know that they have to now go and re-run the sync job.

ppf2 avatar Mar 18 '24 19:03 ppf2

I don't think this is an expected failure, so much as (what looks to me) an OOM or some other system crash/interrupt.

But yes, it is odd that after restart, we edit the job's error value to indicate it went idle, instead of setting the job error when we cancel the framework. @wangch079 any thoughts here?

seanstory avatar Mar 18 '24 19:03 seanstory

From the logs in the screenshot, it looks like it's a graceful shutdown, and I expect the sync job would be set to suspended status.

It's eventually set to error with The job has not seen any update for some time. then I believe the sync job was not suspended successfully.

wangch079 avatar Mar 19 '24 07:03 wangch079

We also have a bug reported that jobs are not suspended any more on graceful shutdown: https://github.com/elastic/connectors/issues/2167

Fixing it can potentially get rid of this error fully

artem-shelkovnikov avatar Mar 19 '24 08:03 artem-shelkovnikov