alerting icon indicating copy to clipboard operation
alerting copied to clipboard

[BUG]

Open ggt opened this issue 1 year ago • 9 comments

Environement Docker

What is the bug? Triggers & Alerts created in schema version 0 and produces java.lang.NullPointerException: null.

How can one reproduce the bug?

Steps to reproduce the behavior:

  • Create an alert and trigger on an old opensearch version
  • Update it to 2.13.0

What is the expected behavior? sudo docker-compose logs opensearch -f --tail 100

Do you have any screenshots? opensearch-node1 | [2024-04-23T10:58:21,094][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [P00APLOG-D01] uncaught exception in thread [DefaultDispatcher-worker-4] opensearch-node1 | java.lang.NullPointerException: null opensearch-node1 | at org.opensearch.alerting.MonitorRunnerService$runJob$2.invokeSuspend(MonitorRunnerService.kt:335) ~[opensearch-alerting-2.13.0.0.jar:2.13.0.0] opensearch-node1 | at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)] opensearch-node1 | at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) [kotlinx-coroutines-core-1.1.1.jar:?] opensearch-node1 | at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) [kotlinx-coroutines-core-1.1.1.jar:?] opensearch-node1 | at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) [kotlinx-coroutines-core-1.1.1.jar:?] opensearch-node1 | at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) [kotlinx-coroutines-core-1.1.1.jar:?] opensearch-node1 | uncaught exception in thread [DefaultDispatcher-worker-4]

Differences

<     "schema_version": 0,
>     "schema_version": 8,

<       "source": "Alerting Notification action",
>       "source": "",      //   Suspecting that to be the issue!

Temporary solution Copy informations from old and create a new alert and trigger new.txt old.txt

TODO Check in the code if "source" is Null produces that error, (no more infos in debug)

Thanks!

ggt avatar Apr 23 '24 15:04 ggt

looking into it. added to backlog.

sbcd90 avatar Apr 29 '24 18:04 sbcd90

Can confirm this occurred to 5 of our monitors on AWS hosted OS - the frustrating part about this is the complete silence/"green status" on the OpenSearch dashboards, which makes it look like everything is firing per usual. Only tell is the NPE log thrown every time the monitor was supposed to run.

zakisaad avatar Jun 14 '24 00:06 zakisaad

I got similar problem.

[2024-06-18T08:45:45,552][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch] uncaught exception in thread [DefaultDispatcher-worker-8] java.lang.NullPointerException: null at org.opensearch.alerting.MonitorRunnerService$runJob$1.invokeSuspend(MonitorRunnerService.kt:345) ~[opensearch-alerting-2.14.0.0.jar:2.14.0.0] at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)] at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) [kotlinx-coroutines-core-1.1.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) [kotlinx-coroutines-core-1.1.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) [kotlinx-coroutines-core-1.1.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) [kotlinx-coroutines-core-1.1.1.jar:?]

I got this problem a month ago, i fixit (i try few things but not remember) after a reboot yesterday the error appeared again.

Not use docker, just Logstash->Opensearch in debian box.

diegargon avatar Jun 18 '24 06:06 diegargon

We had to export -> import all of our monitors using the JSON Export feature, as we had used the UI to define the monitors directly. The 2.13 upgrade has been painful for us.

I suggest you export each of the monitors in JSON form, disable them, go to the Dev Tools console in the sidebar, and re-import each of them using a POST request (you might want to strip the id fields from the exported JSON).

If you define monitors using IaC (Terraform or some other in-house tooling), deleting the monitors and re-creating them via your pipelines should also work as a simple solution.

zakisaad avatar Jun 18 '24 06:06 zakisaad

Yes, that's what I think I remember doing the first time it happened but manually, but this time it didn't work.

edit: Some detectors work but i restart opensearch and everything begin fail again

diegargon avatar Jun 20 '24 06:06 diegargon

If they don t appears, search for my other thread to fix it! I shown how to modify the request to be able to see them

Le jeu. 20 juin 2024, 08:53, DieGarGon @.***> a écrit :

Yes, that's what I think I remember doing the first time it happened but manually, but this time it didn't work.

— Reply to this email directly, view it on GitHub https://github.com/opensearch-project/alerting/issues/1526#issuecomment-2179948506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHF6ZBBB4KN6CAT54R4SNLZIJ37VAVCNFSM6AAAAABJJL5ZLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZZHE2DQNJQGY . You are receiving this because you authored the thread.Message ID: @.***>

ggt avatar Jun 20 '24 15:06 ggt

Hi, the null pointer exception is a bug in 2.13 coming from a log statement here: https://github.com/opensearch-project/alerting/blob/8007d0a43d8077af4c9b3cedc91c747a75b683e6/alerting/src/main/kotlin/org/opensearch/alerting/MonitorRunnerService.kt#L338

It has been fixed in this PR: https://github.com/opensearch-project/alerting/pull/1630/files. Until the code fix is released here are the steps you can perform for a temporary solution.

// Check if there are any stuck locks
POST .opensearch-alerting-config-lock/_search?pretty
{
  "query": {
    "match": {
      "released": "false"
    }
  }
}

// Delete all stuck locks
POST .opensearch-alerting-config-lock/_delete_by_query?pretty
{
  "query": {
    "match": {
      "released": "false"
    }
  }
}

jowg-amazon avatar Aug 20 '24 17:08 jowg-amazon