overwatch icon indicating copy to clipboard operation
overwatch copied to clipboard

Second oberwatch job crashes with concurrent update error

Open anogues-danone opened this issue 2 years ago • 2 comments

Hello.

We have several overwatch jobs on the same delta database but of course different workspaces. They are scheduled at the same time.

From time to time we get in one of the jobs the following error:

StreamingQueryException: Query StreamTo_audit_log_raw_events [id = e8234934-5f23-4e5a-ab64-0263b8327a8b, runId = 7c017df9-5507-4e2e-bbf5-c73d5f61691e] terminated with exception: [INTERNAL_ERROR] Execution of the stream StreamTo_audit_log_raw_events failed. Please, fill a bug report in, and provide the full stack trace. Caused by: [INTERNAL_ERROR] Execution of the stream StreamTo_audit_log_raw_events failed. Please, fill a bug report in, and provide the full stack trace. Caused by: AssertionError: assertion failed: Concurrent update to the commit log. Multiple streaming jobs detected for 0

I think its because both overwatch jobs are using the same tables even if they are loading data from different workspaces. I am not sure this error should be normal providing that based on delta tables concurrency control I understood that yes, merges can bring trouble but we are talking about data from different workspaces, so I think its probably because the table is not partitioned per workspaceid or something that generates this problem. Anyway.

My question is, is this normal? Because if this is normal behaviour, for all workspaces that are in the same region and thus share the same database we can not parallelize them and we will have to serialize all these jobs, but this can be a problem because we can have a large number of workspaces to load.

Maybe its just not possible to use the same database by both workspaces and we need a database for each workspace?

Regards, Albert

anogues-danone avatar Sep 15 '22 09:09 anogues-danone

Hello, this will happen when there is a schema change (evolution) and both runs attempt to write at the same time. You can spread the jobs out by a few minutes to reduce the frequency of the error. The next release (0.6.2.0) have an exception handler for this and will resolve this issue. The next release is due out very soon.

GeekSheikh avatar Sep 15 '22 11:09 GeekSheikh

Hello @GeekSheikh

It's very strange as its not the first run of the job, and we haven't modified the library, so we shouldn't have that error right?

It's true that happens when jobs ran concurrently, but i dont htink there is any schema evolution happening there. IS there something that can trigger this schema evolution if we haven't upgraded the fat jar nor being the first execution?

Waiting for the new release.

Thanks!!!

anogues-danone avatar Sep 15 '22 14:09 anogues-danone

Assuming that this is resolved. Please re-open and let us know if you have any further issues. Several new releases have been published since this conversation.

Thank you.

GeekSheikh avatar May 08 '23 21:05 GeekSheikh