docs
docs copied to clipboard
How should a cds-service let an EHR know about errors? OperationOutcome?
How should a cds-service let an EHR know about errors?
Following RESTful best practices, HTTP error statuses returned by a cds-service to an EHR when appropriate:
- 403 Forbidden for authnz failures
- 503 Method not Allowed for incorrect HTTP method
For business-logic type errors, it might make sense for the service to return a 500 containing a FHIR OperationOutcome with values from the FHIR IssueType valueset, but further constrained.
@isaacvetter - Are there particular scenarios or use cases that prompted you to raise this question?
I would imagine the CDS service should return standard HTTP error codes when appropriate. I hadn't thought of the need to explicitly call this our in our spec for CDS services.
Regarding the 'business-logic' errors, what would the EHR do with these errors? This question likely goes back towards the scenarios/use cases that you are thinking of.
Related to this discussion, an outstanding/unresolved question I've had is what expectations should be placed on the EHR to indicate CDS service invocation failures or timeouts? This question can be framed both around error and SLA expectations. I don't know that we need strong requirements around this but instead could provide acknowledge and perhaps guidance on how this can be handled.
@kpshek I think there should also be explicit thought about what happens when a CDS service is offline, down for maintenance, or otherwise unavailable. Some sort of graceful degradation would be ideal.
@olbrich - I think we can use RESTful conventions of standard HTTP error code for this particular use case. For instance, in the case of a CDS service down for maintenance, I think the CDS service can simply return a 503 status:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server.
I call this out because I don't think it's necessary to add any additional fields to the response of the CDS service.
The next (and more important question) is how should the EHR handle this scenario when it occurs? This is where I'm unsure if this is something we can handle in the spec. Instead, I'm thinking we should perhaps create an implementation or best practices guide that calls out these scenarios to raise visibility and provide examples of how they should be dealt with. The reason why I'm thinking this is because the nature of the CDS service and it's importance to the organization's workflow/business will determine how the EHR should handle failures such as this.
Agree with @kpshek regarding just using/returning standard HTTP error codes.
It seems unnecessary to return additional FHIR objects related to possible 500s - if there are recurring issues it's the CDS service's responsibility to fix their app or communicate possible issues to the EHR (they will surely be very noisy if there are issues as long as they have a publicly available contact). Doesn't feel like there is a strong need to add anything to the standard to cover this.
In regards to EHR responsibility for possibly tracking errors/timeouts SLA levels. I think it's okay to trust that the client and the CDS vendor can manage this as per their existing contractual obligations without needing to address it in the standard.
During scheduled downtime, I don't think the standard needs to define how the EHRs handle a resulting 503. Perhaps some suggestions or a best practice guide would be okay, but it's up to the EHR's discretion on what gracefully handling a 503 looks like. The CDS vendor should be communicating with the client well ahead of time to minimize the impact of any major workflow disruptions anyways.
+1 for OperationOutcome
Having details about the errors help with troubleshooting. If you get back a 500 without any additional info the only recourse to try to figure out what's wrong is to ask the server owners to check their logs. If there are some details about what's going wrong maybe it is possible to fix it directly, or at least not having to start the troubleshooting from square zero.
Why should the CDS Service return an OperationOutcome
? I've not known 500 errors to be parsed by the caller as due to their nature ('Internal Server Error'), they are opaque. To that, I don't see the EHR parsing the 500 error response and doing anything with it.
So, why define/constrain the CDS Service to return any particular data in the body of a 500 response?
Not that much experience with CDS - but when we act as a FHIR client we (try to) parse and log the error messages we got back from the servers - and that's very valuable in tracking down and solving problems. I would imagine that the same would apply for CDS calls.
I agree that having the EHR log errors from CDS Services would be valuable. I have no problems if a CDS Service wants to log an OperationOutcome on on their own, but I don't see a need to prescribe this in the spec.
At the Madrid Connectathon this past weekend (2017-05-07), a large group of us participating in the CDS Hooks track discussed this issue in an offline discussion. This group of ~12 represented a broad set of stakeholders, from multiple EHR vendors, several CDS Service providers, and a healthcare organization. What follows is the summary of our discussion and the consensus of the group.
Our documentation should provide specific guidance for CDS Services to return appropriate HTTP status codes in cases of errors. Since we already leverage RESTful conventions, it makes most sense for us to make use of standard HTTP status codes.
Our documentation should provide guidance for CDS Services to return error information in the response body as it sees fit. However, we should not prescribe a particular error response structure. This does not preclude CDS Services from returning an OperationOutcome.
Our documentation should provide guidance for EHRs to log errors from CDS Service calls as it sees fit. As EHRs are bound by their own technical and operational practices, as well as that of the healthcare organization, the manner in which they address such errors will be unique. The group agreed that our documentation should note the importance of logging such errors, especially when debugging integration between a CDS Service and the EHR.
As an outcome, this leads a client hanging. What should the client be prepared to receive and process so that it can display a coherent message to the user? This matters. There's nothing users - and trouble shooting programmers - hate more than a blank screen when there could be some useful information presented. I'd prefer OperationOutcome, but if we can't settle on that, can we nail it down to very short list of formats, so that the client can know how to display errors it gets?
Here's some possibilities for a CDS service to return:
- OperationOutcome
- Text in the body
- a valid CDS-hooks response with a single card labelled as an error
- an full html page with a lovely stack dump that no one can read in text - or a lovely haiku. ;-(
- an empty body with minimally useful message on the status code
- a json object that describes the error
- markdown that describes the error
I've seen all these from generic web applications.... please can we make this a bounded problem for clients
From a REST perspective, every time when one invokes a service of this type, he is requesting to create a new unique resource (the calculation or evaluation result).
When there is a service error, it is of course 500 error. When the rule evaluation cannot be performed due to authentication or authorization failure it can be either 404 or 403 (see https://httpstatuses.com/403). Note that 403 does not have to mean that there is a security access related issue, it could also mean that there is bad data so that the service cannot perform the requested operation.
One catch here is that what if there is a partial failure? For example, I have a request for evaluation on a rule that contains a set of sub-rules. Only part of that failed due to data validation. In this case, I suppose that the service implementer can choose to either fail the whole request and return 403 with detail of the failure in the response body, or fail only partially. If the later, it would be a status code 200 but include the status of each sub-rule in the response body, so that the invoking system can take proper action precisely on the failed part(s).
Either implementation of using 40X or include error in response has advantages and disadvantages. The 40X is simpler but the 200 with detail in response gives more control. Of course, one might also choose to make the choice a user choice (different URLs for the two options).
One site that I found that might be helpful is the google doubleclick-search API reference: https://developers.google.com/doubleclick-search/v2/standard-error-responses. Can we copy what they are doing?
Also, I think that one simple rule of thumb is to separate the API level error from the application level error (or rule level error). For API level error, return as HTTP error codes without calculation result (since it is not applicable). For application level error, return HTTP 200 status code and any errors in the response body. This approach simplifies the server and client implementation, and more importantly, it allows one to independently vary the error handling for API and the implementation.
Just my 2 cents :).
This came up again on Zulip for Precondition Failed - https://chat.fhir.org/#narrow/stream/179159-cds-hooks/topic/Precondition.20Failed