elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[Inference API] Adds Amazon Bedrock support to Inference API

Open markjhoy opened this issue 1 year ago • 0 comments

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

markjhoy avatar Jun 27 '24 23:06 markjhoy

Open questions

We build health indicators with AbstractHealthIndicator(slo.getFailedMessage()). It's unclear to me if the failed message ever appears in /actuator/health response body output.

Some of the SLOs are a combination of two or more indicators. For example, in jvmTotalMemory, we set a relatively low threshold on GC overhead (20% of CPU time over the last 5 minutes) if there is 90% pool utilization as well. These composite SLOs are registered with the relatively new CompositeHealthContributor.fromMap(..) API. Unfortunately there is no way I can see to provide details and a failed message name on the composite. I'd like to add details and a failed message for each contributing health indicator and potentially a different one for what it means for a set of such indicators to fail together. @philwebb you may have suggestions? An example is included below of what I think might be nice (specifically the details directly underneath jvmTotalMemory)?

"jvmTotalMemory": {
  "status": "UP",
  "details": { 
     "someTag": "someValue"
  },
  "components": {
    "jvmGcOverhead": {
      "status": "UP",
      "details": {
        "value": "0.01%",
        "mustBe": "<20%",
        "unit": "percent CPU time spent"
      }
    },
    "jvmMemoryConsumption": {
      "status": "UP",
      "details": {
        "value": "9.09%",
        "mustBe": "<90%",
        "unit": "maximum percent used in last 5 minutes"
      }
    }
  }
}

jkschneider avatar May 04 '20 22:05 jkschneider

Thanks @jkschneider! I'll target this for 2.4.x so we remember to take a look as soon the 2.3.0 release crunch is over.

philwebb avatar May 05 '20 20:05 philwebb

We haven't had a chance to take a look at this change, nor upgrade to Micrometer 1.6. We're already quite late in the Milestone cycle and we don't think we'll have time to address this change properly. We need to take a look at this change and its implications (including the new concepts introduced and the Health endpoint format).

bclozel avatar Sep 28 '20 15:09 bclozel

@snicoll and I discussed this today. There are a few things that came up:

  1. Since we decided that the diskspace health indicator should ideally be something that can be configured in the monitoring system, this feels very much along those lines. If we decide to surface the SLO's as a health indicator, we should align our strategy for diskspace accordingly. Even with the deprecation of the diskspace indicator, we could surface that information in health via the SLOs.
  2. We are not sure if having a top-level component for every SLO is the best way to do this. Maybe having some sort of nested structure for the SLOs might be a better alternative.
  3. From an API perspective, we could have an API to expose SLOs which we could use to create the composite rather than the current method which registers beans within a bean method.

Flagging for team-meeting so that we can discuss this on the next team call.

mbhave avatar Sep 16 '21 14:09 mbhave

We discussed this some more as a team today and our feeling is that we're not sure that we have a strong enough opinion to auto-configure SLOs has health indicators. We can see that it may make sense for some users but not for others. For example, in some cases, a proxy will already be aware of the error rate for requests that it routes to an application instance. In this case, exposing the information via a health endpoint that it will also be monitoring will be of minimal value, and may even be harmful depending on how things behave when the application's health changes. For users that do want to expose SLOs as health indicators, we could provide some classes that make it easier to do so.

Since this proposal was made, we've also introduced the concept of application state. It may be that some users want to configure things such that an unmet objective results in a change to the application state to indicate that it's no longer ready, for example. We could provide some helper classes that a user can configure to connect SLOs to application state.

We discussed possibly auto-configuring the HealthMeterRegistry, automatically adding any ServiceLevelObjective beans to it. We could auto-configure some ServiceLevelObjective beans such as JvmServiceLevelObjectives.MEMORY and OperatingSystemServiceLevelObjectives.DISK rather than hard-coding them as proposed here. This would align with our auto-configuring of Micrometer's various Jvm…Metrics classes.

Overall, our feeling was that we would stop short of anything that exposes the SLOs externally, instead auto-configuring the HealthMeterRegistry and supporting beans and making it easier for a user to then plug the SLOs into health or application state in a way that meets their specific needs.

@shakuzen @jonatan-ivanov Could we have your input here please? Are we right to be cautious and just give users the parts they need and leave them to join things together or is there some clearly established usage of HealthMeterRegistry and SLOs that means that we can proceed with confidence in a particular direction?

wilkinsona avatar Sep 17 '21 16:09 wilkinsona

sorry,operation error

mjf1310 avatar Nov 03 '21 16:11 mjf1310