connectors "The job has not seen any update for some time." error is not helpful

"The job has not seen any update for some time." error is not helpful

Open ppf2 opened this issue 11 months ago • 3 comments

We need some better error handling in our connectors.

For example, the "The job has not seen any update for some time." is not helpful.

          "created_at": "2024-03-18T18:39:04.380Z",
          "deleted_document_count": 0,
          "error": "The job has not seen any update for some time.",
          "indexed_document_count": 16521,
          "indexed_document_volume": 173,
          "job_type": "full",
          "last_seen": "2024-03-18T18:59:06.729398+00:00",
          "metadata": {},
          "started_at": "2024-03-18T18:39:23.061424+00:00",
          "status": "error",
          "total_document_count": null,
          "trigger_method": "on_demand",

Meanwhile, the connectors logs are also not that helpful in diagnosing why this is happening:

My hunch is that this is happening because of a SIGTERM on the Enterprise Search instance running the connector service.

If so, we should handle this type of "expected" (intentional restarts of the connector service) or "unexpected" (unplanned termination of the connector service) failure more gracefully and provide intuitive error handling/logging so the user will know that they have to now go and re-run the sync job.

Mar 18 '24 19:03 ppf2

I don't think this is an expected failure, so much as (what looks to me) an OOM or some other system crash/interrupt.

But yes, it is odd that after restart, we edit the job's error value to indicate it went idle, instead of setting the job error when we cancel the framework. @wangch079 any thoughts here?

Mar 18 '24 19:03 seanstory

From the logs in the screenshot, it looks like it's a graceful shutdown, and I expect the sync job would be set to suspended status.

It's eventually set to error with The job has not seen any update for some time. then I believe the sync job was not suspended successfully.

Mar 19 '24 07:03 wangch079

We also have a bug reported that jobs are not suspended any more on graceful shutdown: https://github.com/elastic/connectors/issues/2167

Fixing it can potentially get rid of this error fully

Mar 19 '24 08:03 artem-shelkovnikov

connectors connectors copied to clipboard

"The job has not seen any update for some time." error is not helpful

connectors
connectors copied to clipboard