spring-boot-admin icon indicating copy to clipboard operation
spring-boot-admin copied to clipboard

Bug: Missing entries in the journal

Open cdprete opened this issue 7 months ago • 19 comments

Spring Boot Admin Server information

  • Version: 3.5.0

  • Spring Boot version: 3.5.0

  • Configured Security: None

  • Webflux or Servlet application: WebFlux

Client information

  • Spring Boot versions: 3.5.0

  • Used discovery mechanism: Eureka

  • Webflux or Servlet application: WebFlux

Description

Hello.

I start by saying this may not be a bug but, maybe, just me not clearly understanding how the journal and its events work.

With that being said, I can see that in the journal the entries for my application are not updated since the 27th of May:

Image

But, given that I've ProcessInfoContributor enabled as well as another custom contributor (see https://github.com/codecentric/spring-boot-admin/issues/4253), I would expect to have them update at least once a day or more. On the other end, if I inspect the payload that's shown in the last entry, it reports correctly the latest available information (at least, for my custom contributor that updates once a day).

I wonder if this may be a bug or, maybe, given that I'm using Hazelcast for clustering (see https://github.com/codecentric/spring-boot-admin/issues/4346), if the latter may be the cause.

cdprete avatar Jun 20 '25 05:06 cdprete

Hi @cdprete, I'm not sure if I understand you correctly. Are you missing audit events or process info? The journal view shows data from the auditevents actuator endpoint. ProcessInfoContributor contributes additional info to the info endpoint and won't show up here as long as you don't publish an event yourself.

ulischulte avatar Jun 27 '25 07:06 ulischulte

Hi @ulischulte

Basically I can't explain to myself why there are those gaps as well as potential missed updates in the journal.

cdprete avatar Jun 27 '25 09:06 cdprete

Just to clarify things - the journal view shows data from the auditevents actuator endpoint, not from the info endpoint. Are you missing/expecting data from the info endpoint in the journal view?

ulischulte avatar Jun 27 '25 18:06 ulischulte

Not directly. I mentioned it, because the event is named INFO_CHANGED and its payload is the composed by info returned by the info endpoint. At least, for what I can see in the UI.

So, my assumption was that every time the info change a new INFO_CHANGED event should be visible. But, of course, I can easily be wrong.

As said, what I would like to understand is why there are events missing.

cdprete avatar Jun 27 '25 18:06 cdprete

Hi @cdprete I see. You're right, any InstanceInfoChangedEvent should trigger the InfoUpdater and result in a new entry in the journal. Since we're not able to reproduce your issue, could you please provide a minimal sample project? Regards, Uli

ulischulte avatar Jul 04 '25 09:07 ulischulte

Hi @cdprete I see. You're right, any InstanceInfoChangedEvent should trigger the InfoUpdater and result in a new entry in the journal. Since we're not able to reproduce your issue, could you please provide a minimal sample project? Regards, Uli

Hi @ulischulte. Just enable the ProcessInfoContributor from your side and, ideally, the info should change on every refresh since the memory informations displayed in there change.

cdprete avatar Jul 04 '25 10:07 cdprete

InfoUpdater in backend is called in whatever spring.boot.admin.monitor.info-interval (default: 1m) is set to. This is mainly used to check every minute wether the instance is still up. In addition spring.boot.admin.monitor.info-lifetime (default: 1m) sets the lifetime of the last info data update. Since it is both set to 1m by default, you see a InfoChangedEvent just every 2nd minute.

If you set the lifetime to 0s, then on any info update in backend the information will trigger an InfoChangeEvent, as long as information from actuator have changed.

SteKoe avatar Jul 05 '25 10:07 SteKoe

In this example I have set lifetime to 0s and interval to 10s. As you can see, every ten seconds, the info is updated.

Image

SteKoe avatar Jul 05 '25 10:07 SteKoe

InfoUpdater in backend is called in whatever spring.boot.admin.monitor.info-interval (default: 1m) is set to. This is mainly used to check every minute wether the instance is still up. In addition spring.boot.admin.monitor.info-lifetime (default: 1m) sets the lifetime of the last info data update. Since it is both set to 1m by default, you see a InfoChangedEvent just every 2nd minute.

If you set the lifetime to 0s, then on any info update in backend the information will trigger an InfoChangeEvent, as long as information from actuator have changed.

Good to know this, but it doesn't explain why there are gaps in the events.

cdprete avatar Jul 05 '25 12:07 cdprete

In this example I have set lifetime to 0s and interval to 10s. As you can see, every ten seconds, the info is updated.

Image

Mmm, have you also tried with the same setup:

  • fetch the instances through Eureka
  • configure an embedded Hazelcast in each SBA instance
  • have 2 SBA instances ?

cdprete avatar Jul 05 '25 12:07 cdprete

Good to know this, but it doesn't explain why there are gaps in the events.

I have set the update time to 10s and I get a InfoChangeEvent every 10s. No gaps.

Mmm, have you also tried with the same setup: 2 SBA instances

Yes.

Image

SteKoe avatar Jul 07 '25 13:07 SteKoe

Good to know this, but it doesn't explain why there are gaps in the events.

I have set the update time to 10s and I get a InfoChangeEvent every 10s. No gaps.

Mmm, have you also tried with the same setup: 2 SBA instances

Yes.

Image

With Hazelcast?

cdprete avatar Jul 07 '25 13:07 cdprete

No.

SteKoe avatar Jul 07 '25 14:07 SteKoe

Can you please try with it?

My suspect, as written initially in the ticket, it's that it may be caused by the clustering. If even with that you can't reproduce the issue, then I've just to live with it because I have no other ideas at the moment.

cdprete avatar Jul 07 '25 14:07 cdprete

Hello.

To confirm once more that in the journal there are missing entries, please see the following:

Image

where the claim is that the instances are up since roughly 40m, but there is no indication of (de)registration nor timeout status updates in the journal:

Image

cdprete avatar Jul 31 '25 05:07 cdprete

Hello @ulischulte @SteKoe.

For what we discussed so far, you told me that what gets displayed there is coming from the auditevents endpoint of Spring Boot Actuator. One of my colleagues was complaining that, if an application gets stopped not in a graceful way, SBA reports its "old" status and therefore indicating it's up since X minutes, even if it was just started. To my understanding and for what I can see in Eureka, the application registers (again) to the service registry and such event should be visible in the journal (REGISTERED), but it is not. I dug further in the Spring Boot docs about such endpoint and, for what I can read, it's not enabled by default since in my applications I'm not providing an instance of AuditEventRepository. The output to /actuator which lists all the available endpoints further confirms this since such endpoint is not in there.

So... how is this really working then...? 😅

cdprete avatar Sep 19 '25 11:09 cdprete

Hi,

For what we discussed so far, you told me that what gets displayed there is coming from the auditevents endpoint of Spring Boot Actuator.

We confused Journal Events and Audit Events in this issue. When we have a look at Journal View, we are only using SBA specific events, which are delivered via event stream on /instances/events. The events are basically the ones you see in the journal table. auditevents itself is a Spring Boot Actuator that we just use in front-end to display the contents. Sorry for mixing this up.

One of my colleagues was complaining that, if an application gets stopped not in a graceful way, SBA reports its "old" status and therefore indicating it's up since X minutes, even if it was just started.

That's right. We update the status time whenever an instance is newly registered (i.e. it was never known to SBA before) or when the status changes (e.g. UP -> OFFLINE, OFFLINE -> UP, UP -> DOWN). Instances are identified by a hash of the health URL just like Eureka does. Since we poll (in default) every 10 seconds for the current state, it might happen that the instance restarts within this interval and is still considered as "UP since X Minutes".

At this point I assume that you expect the uptime to be reset to zero as the instance has been restarted, right?

To my understanding and for what I can see in Eureka, the application registers (again) to the service registry and such event should be visible in the journal (REGISTERED), but it is not.

It is not shown as REGISTERED but REGISTRATION_UPDATED because it is already known to SBA.

Examples

Case 1: Service A registers at SBA for the very first time and gets SIGKILL

  • Service A registers at Eureka which is polled by SBA

  • Journal shows

    • REGISTERED (Eureka logs: "Registered service Service A with status UP")
    • STATUS_CHANGED
    • ENDPOINTS_DETECTED
    • INFO_CHANGED
  • Service A gets a SIGKILL (hence, no graceful shutdown) and is restarted immediately

  • Journal shows

    • REGISTRATION_UPDATED (Eureka logs: "Registered service Service A with status UP")

Case 2: Service A registers at SBA for the very first time and has graceful shutdown

  • Service A registers at Eureka which is polled by SBA

  • Journal shows

    • REGISTERED (Eureka logs: "Registered service Service A with status UP")
    • STATUS_CHANGED
    • ENDPOINTS_DETECTED
    • INFO_CHANGED
  • Service A is terminated gracefully

  • Journal shows

    • STATUS_CHANGED (Eureka logs: "Registered service Service A with status DOWN")
    • DEREGISTERED

SteKoe avatar Sep 19 '25 16:09 SteKoe

Hi @SteKoe

We confused Journal Events and Audit Events in this issue. When we have a look at Journal View, we are only using SBA specific events, which are delivered via event stream on /instances/events. The events are basically the ones you see in the journal table. auditevents itself is a Spring Boot Actuator that we just use in front-end to display the contents. Sorry for mixing this up.

Ok, now it makes sense.

At this point I assume that you expect the uptime to be reset to zero as the instance has been restarted, right?

Yes, otherwise, as per my colleague complain, it's very misleading.

It is not shown as REGISTERED but REGISTRATION_UPDATED because it is already known to SBA.

I never saw this event, but I'll try to test it now that I've more insights.

cdprete avatar Sep 19 '25 16:09 cdprete

@SteKoe the complain of my colleague is off-topic when considering the scope of the current ticket we're in. Therefore, I've opened https://github.com/codecentric/spring-boot-admin/issues/4664 for it.

cdprete avatar Sep 20 '25 08:09 cdprete