notifo icon indicating copy to clipboard operation
notifo copied to clipboard

Notifo stops processing events?

Open benjyiw opened this issue 11 months ago • 6 comments

Hi again,

I have ran into a bug and unfortunately I don’t have many details to provide. I figure I would just open the issue now to see if that sparked any ideas from your end. I intend to provide more information if/when it happens again, so that I can drill down and provide more logs and perhaps poke around in the DB.

At some point in the past 90 days, Notifo seemingly stopped processing events. I am running Notifo in k8s, so my first hail mary attempt at a resolution was to kill the Notifo pod, and unsatisfyingly that fixed it. This happened for two separate environments, so this is a real bug, but unfortunately I have nothing that I think would be very helpful to provide as I can’t cause the bug to reoccur in order to get more information. I'll provide what I do have below.


Prior to restart of the pod the only logs I saw were simple HTTP access logs.

During pod shutdown, Notifo logged the following errors, I don't know if this is actually problematic:

2024-03-19 11:05:32	
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(user-events)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":"   at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n   at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n   at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n   at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}

2024-03-19 11:05:32	
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(events)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":"   at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n   at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n   at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n   at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}

2024-03-19 11:05:32	
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(confirms)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":"   at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n   at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n   at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n   at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}

After I restarted the application, I saw dozens of logs around new events being processed, here's one example:

{"logLevel":"Information","message":"Received event for app 026fa665-d935-4b55-a99b-edd47647039a with ID 84e0ae4b-7c86-4991-adf8-57860c858fa7 to topic users/9739ce9f-bc50-4c2b-abeb-1ace7da20ab7.","appId":"026fa665-d935-4b55-a99b-edd47647039a","id":"84e0ae4b-7c86-4991-adf8-57860c858fa7","topic":"users/9739ce9f-bc50-4c2b-abeb-1ace7da20ab7","timestamp":"2024-03-19T17:06:03Z","severity":"INFO","category":"Notifo.Domain.UserEvents.Pipeline.UserEventPublisher"}

benjyiw avatar Mar 19 '24 17:03 benjyiw

So you are using RabbitMQ as a backbone for events?

SebastianStehle avatar Mar 19 '24 18:03 SebastianStehle

So you are using RabbitMQ as a backbone for events?

Yep, that's what we are currently using.

benjyiw avatar Mar 19 '24 18:03 benjyiw

I had a look to the code and I think right here a log statement could help: https://github.com/Squidex/libs/blob/main/messaging/Squidex.Messaging.RabbitMq/RabbitMqSubscription.cs#L51

I will provide that asap. Perhaps it helps. We could also think about adding a method to get the number of pending messages and add that to the health check endpoint.,

SebastianStehle avatar Mar 19 '24 19:03 SebastianStehle

I have pushed a new version with the improved logging.

SebastianStehle avatar Mar 21 '24 19:03 SebastianStehle

Thank you! I found the squidex/notifo-dev image and saw that it has a new tag on it. I'll try that out and update this issue if I learn anything more.

benjyiw avatar Mar 22 '24 20:03 benjyiw

~I was curious, so I checked the most recent commits, and I don't actually see the logging changes to the RabbitMQ messaging implementation. Am I missing something?~

Ah, never mind, I see those changes in the lib repo. I had assumed that the link before was in this repo and didn't catch that it was actually elsewhere.

benjyiw avatar Mar 22 '24 20:03 benjyiw