notifo
notifo copied to clipboard
Notifo stops processing events?
Hi again,
I have ran into a bug and unfortunately I don’t have many details to provide. I figure I would just open the issue now to see if that sparked any ideas from your end. I intend to provide more information if/when it happens again, so that I can drill down and provide more logs and perhaps poke around in the DB.
At some point in the past 90 days, Notifo seemingly stopped processing events. I am running Notifo in k8s, so my first hail mary attempt at a resolution was to kill the Notifo pod, and unsatisfyingly that fixed it. This happened for two separate environments, so this is a real bug, but unfortunately I have nothing that I think would be very helpful to provide as I can’t cause the bug to reoccur in order to get more information. I'll provide what I do have below.
Prior to restart of the pod the only logs I saw were simple HTTP access logs.
During pod shutdown, Notifo logged the following errors, I don't know if this is actually problematic:
2024-03-19 11:05:32
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(user-events)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":" at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}
2024-03-19 11:05:32
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(events)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":" at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}
2024-03-19 11:05:32
{"logLevel":"Error","stoppedSystemFailed":"Messaging.Consumer(confirms)","timestamp":"2024-03-19T17:05:32Z","severity":"ERROR","exception":{"type":"System.NullReferenceException","message":"Object reference not set to an instance of an object.","stackTrace":" at RabbitMQ.Client.Impl.ModelBase.BasicCancel(String consumerTag)\n at RabbitMQ.Client.Impl.AutorecoveringModel.BasicCancel(String consumerTag)\n at Squidex.Messaging.RabbitMq.RabbitMqSubscription.DisposeAsync()\n at Squidex.Messaging.Implementation.DelegatingConsumer.StopAsync(CancellationToken ct)\n at Squidex.Hosting.BackgroundHost.StopAsync(CancellationToken cancellationToken)"}}
After I restarted the application, I saw dozens of logs around new events being processed, here's one example:
{"logLevel":"Information","message":"Received event for app 026fa665-d935-4b55-a99b-edd47647039a with ID 84e0ae4b-7c86-4991-adf8-57860c858fa7 to topic users/9739ce9f-bc50-4c2b-abeb-1ace7da20ab7.","appId":"026fa665-d935-4b55-a99b-edd47647039a","id":"84e0ae4b-7c86-4991-adf8-57860c858fa7","topic":"users/9739ce9f-bc50-4c2b-abeb-1ace7da20ab7","timestamp":"2024-03-19T17:06:03Z","severity":"INFO","category":"Notifo.Domain.UserEvents.Pipeline.UserEventPublisher"}
So you are using RabbitMQ as a backbone for events?
So you are using RabbitMQ as a backbone for events?
Yep, that's what we are currently using.
I had a look to the code and I think right here a log statement could help: https://github.com/Squidex/libs/blob/main/messaging/Squidex.Messaging.RabbitMq/RabbitMqSubscription.cs#L51
I will provide that asap. Perhaps it helps. We could also think about adding a method to get the number of pending messages and add that to the health check endpoint.,
I have pushed a new version with the improved logging.
Thank you! I found the squidex/notifo-dev
image and saw that it has a new tag on it. I'll try that out and update this issue if I learn anything more.
~I was curious, so I checked the most recent commits, and I don't actually see the logging changes to the RabbitMQ messaging implementation. Am I missing something?~
Ah, never mind, I see those changes in the lib repo. I had assumed that the link before was in this repo and didn't catch that it was actually elsewhere.