AspNetCore.Diagnostics.HealthChecks Support more Azure EventHub client types

What would you like to be added:

Today the AzureEventHubHealthCheck only supports one type of client: EventHubProducerClient.

https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks/blob/42c2c732a5369849d59f9d79cdd4be497da98786/src/HealthChecks.Azure.Messaging.EventHubs/AzureEventHubHealthCheck.cs#L8-L13

We should add support for the other kinds of EventHub clients:

EventHubConsumerClient
PartitionReceiver

There is another client EventProcessorClient, however this client doesn't have a GetXPropertiesAsync() API that we use in the AzureEventHubHealthCheck. So it may not be possible to implement a Health Check based on that client.

Why is this needed:

.NET Aspire supports registering the above 4 client types. In trying to enable health checks in .NET Aspire, only the EventHubProducerClient can be used directly. See https://github.com/dotnet/aspire/pull/4139.

cc @adamsitnik @jsquire @oising

Jul 10 '24 16:07 eerhardt

The processor supports runtime properties as events are streamed from the service by setting EventProcessorClientOptions.TrackLastEnqueuedEventProperties. This can be read via the ReadLastEnqueuedEventProperties method on the partition context associated with the processing args or using the ReadLastEnqueuedEventProperties protected method on the class.

Neither is really appropriate for a pull-based approach in an external package. One option would be to do it via callback that apps could invoke as part of their event processing. Otherwise, the recommendation would be to use the producer client as the health check sentinel.

Jul 10 '24 18:07 jsquire

I'm not sure I understand the reason to allow more than one type of client to validate event hub health. Is this just to be able to reuse the underlying aspire client instance rather than have to spin up a sister client 75% of the time?

Jul 11 '24 13:07 oising

Is this just to be able to reuse the underlying aspire client instance rather than have to spin up a sister client 75% of the time?

Correct. See https://github.com/dotnet/aspire/tree/main/src/Components#health-checks. Specifically:

Consider whether the Health Check should reuse the same client object registered in DI by the component or not. Reusing the same client object has the advantages of getting the same configuration, logging, etc during the health check.

Jul 11 '24 14:07 eerhardt

Is this just to be able to reuse the underlying aspire client instance rather than have to spin up a sister client 75% of the time?

Correct. See https://github.com/dotnet/aspire/tree/main/src/Components#health-checks. Specifically:

Consider whether the Health Check should reuse the same client object registered in DI by the component or not. Reusing the same client object has the advantages of getting the same configuration, logging, etc during the health check.

Okay. Would this issue be better placed on the Azure SDK repo?

Jul 11 '24 15:07 oising

Would this issue be better placed on the Azure SDK repo?

No. The EventHub health checks library in this repo only supports EventHubProducerClient. It can add support for EventHubConsumerClient and PartitionReceiver without a change in the Azure SDK. It would just need new constructors that take those clients, and then calling the corresponding "GetXPropertiesAsync()" methods on the respective clients.

For EventProcessorClient Aspire can continue using the producer client as the health check sentinel.

Jul 11 '24 15:07 eerhardt

Okay. Would this issue be better placed on the Azure SDK repo?

This opens up a very interesting discussion about the definition of "health check" that the Azure SDK developers have not finished arguing over. 😄

Jul 11 '24 18:07 jsquire

@jsquire just curious, could you give an update on your comment above?

Jan 30 '25 19:01 danmoseley

@danmoseley: Happy to, but there's nothing new to share. While emitting some form of metrics is on our long-term roadmap, there's no work currently being done in this area. My comment above was intended to be more tongue-in-cheek, as part of exploring potential metrics designs, we've seen customer feedback and our own internal discussions disagreeing on what the "correct" definition of a health check is.

For example:

Is it something client-focused that indicates "yes, the processor is still functioning and attempting to read. If you have no events flowing, don't panic there's just no data in the partitions."
Is it something application-focused that answers "is my processing falling behind" by testing the depth of the event backlog waiting to be processed? (and further, would that be a per-partition check or an overall check on the Event Hub resource?)
Is that something end-to-end focused where the client sends periodic punctuation events and then confirms that data flows round-trip?

These are some of the more popular ones that have surfaced. Our take-away there thus far is that these are all valid and potentially valuable in different contexts; a "health check" means different things to different people. This is why we've been focused on "what can we emit for metrics that would let people build what is right for their scenario" rather than any sort of built-in health check.

Okay. Would this issue be better placed on the Azure SDK repo?

To be clear, this approach would resolve to a request to add the standard metadata operations from the lower-level clients (GetPartitionProperties, GetPartitionIds, GetEventHubProperties) to the processor types so that applications wouldn't need to create a producer or consumer instance just to pull runtime info for monitoring. I don't know that it adds a ton of value, but it's something that we're open to, assuming we're able to make a strong enough case to get approval for the API changes.

Jan 31 '25 17:01 jsquire

I wonder if it would just make more sense to call the event hubs REST api instead of trying to expand this to all client types. Again, it falls down to what level of health check you want.

Jan 31 '25 17:01 oising