aerie icon indicating copy to clipboard operation
aerie copied to clipboard

Implement common HTTP error message formats for all Aerie services

Open dandelany opened this issue 4 months ago • 1 comments

Background

@AaronPlave @duranb and I have been working on improving error messaging across Aerie. One major impediment to this is the fact that there is no common standard format/structure for errors returned from HTTP services - some just respond with a string, others use JSON but with different fields, etc.

Proposal

We propose that all HTTP endpoints on Aerie services (with some exceptions below) should respond in the following way when they encounter an error/fatal problem:

  • They should respond with an error status code (4xx or 5xx, not 200)
  • The response body should contain a single JSON object with the following properties:
    • required fields:
      • type: very short, semi-human-readable string representing the category/class/type of error, in all caps and underscores, eg. “INVALID_SIMULATION_ID”
      • message: short (1-2 sentences) human-readable string explaining the cause of the error
      • timestamp: ISO 8601 UTC string timestamp at the time the error happened
    • optional fields:
      • service: optional string identifying the backend service that threw the error
      • cause: longer human-readable string, explaining detailed cause of error & any recommendations to fix
      • trace: stack trace of error (in any language), to be printed in monospace font in the UI & collapsible
      • data: optional unstructured data object with any additional useful error data

Example

{
  "type": "INVALID_ACTIVITY_PARAMETER_TYPE",
  "message": "The value \"seven\" is not a valid value for activity parameter \"dataRate\" (expected number)"
  "timestamp": "2025-08-28T00:15:03.678Z"
  "service": "merlin-worker",
  "cause": "Activity parameter values must match the type assigned to them in the mission model, defined by ...."
  "trace": "Exception in thread \"main\" java.lang.NullPointerException: Cannot invoke \"String.length()\" because \"str\" is null\n    at com.example.MyClass.myMethod(MyClass.java:10)\n    at com.example.MyClass.main(MyClass.java:5)"
  "data": {"value": "seven"}
}

Exceptions

Endpoints accessed by Hasura Actions

A slightly different error format is required for any Aerie service endpoints which are accessed by Hasura Actions (as opposed to via the UI or other services). This is because Hasura does its own thing to wrap errors, and when doing so it ignores properties other than 'message' and 'extensions'. Therefore we propose for these endpoints:

  • responses should return a JSON object with the format: {message: "...", {extensions: ...}}
  • …where the extensions object otherwise matches the structure above (except message, which is outside extensions)

Hasura Actions

Hasura actions themselves respond slightly differently - since they may call multiple action endpoints, they wrap the previous format in an errors array, eg:

{errors: {message: "...", {extensions: ...}}

Note that this is outside of our control (Hasura does it).

Implementation

Our plan for implementing this is:

  1. Get consensus among dev team that this is the desired format (done)
  2. Implement support in aerie-ui (and any other dependent services) for the new error format, without deprecating old formats yet
  3. Incrementally update Aerie services, modifying their error responses to match the new format
  4. Deprecate support for old error formats

@AaronPlave will be leading this effort, expected to happen over multiple PRs.

dandelany avatar Aug 28 '25 00:08 dandelany

We'll need to dig more into scenarios where multiple errors are returned and also look into what we do now and our proposal for handling multiple errors. Also mention the case of hasura actions that do not return an error but instead return other data that can indicate a failure/problem of some sort.

AaronPlave avatar Sep 22 '25 22:09 AaronPlave