adr [ADR] Universal Error Specification

Interested?

Every comment, every link to any information about it, and every opinion is appreciated (including the opinionated and strong ones), as small as it could be.

Please participate.

Nov 03 '22 05:11 yordis

This pull request has been automatically marked as "Withdrawn". If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated!.

Jun 10 '23 01:06 stale[bot]

How could we tell if a given error is "safe" to expose at the edge to the client vs. being an internal error only?

Safe is decided by the domain-specific level, whatever that means.

Jul 19 '24 18:07 yordis

What about PII tagging of a given metadata?

Jul 20 '24 06:07 yordis

@dallashall let me follow up on your notes; deeply appreciate, in the meantime; do not focus on the text as much since I do not want to enforce any formatting of the values but the structure itself for now

Aug 23 '24 16:08 yordis

Hey folks, I do not want to remove all the docs written, but keep in mind that I do want to enforce any formatting of the values itself and be a freeform for the most part; leaving to another ADR to enforce formatting; expecting that each company opt-into different things

Aug 26 '24 00:08 yordis

@dallashall about having In-system and Out-system, let me speak my mind; I am unsure.

I am unsure about the value of having two specs (ignore implementation) since they could have the same metadata.

That said, we must find a way to tag the error as internal; otherwise, the system would map the mistakes to a generic one.

Should we add a dimension that marks the error as internal only? By the way, I came across the need for the metadata level of visibility when considering GDPR.

This is purely from the spec perspective; in code, you could define two classes, InternalError and PublicError, which will enforce such dimensions to be correctly set up. The good part of having a spec is that such low-level SDKs could be done once and avoid mistakes around the topic.

Again, I do not see the point of having two specs, but I agree with the problem of exposing a given error incorrectly.

The links you shared (which are excellent links, thank you so much) mentioned multiple that I am talking about:

Manage exceptions in a centralized manner to avoid duplicated try/catch blocks in the code https://owasp.org/www-project-developer-guide/draft/design/web_app_checklist/handle_errors_and_exceptions/

That means that we never expose debug_info to the clients, and if we add the extra internal dimension, we could transform it into a generic INTERNAL error!

Let me know what you think.

Aug 26 '24 00:08 yordis

Added Visibility to the spec

Aug 26 '24 00:08 yordis

@dallashall about having In-system and Out-system, let me speak my mind; I am unsure.

I am unsure about the value of having two specs (ignore implementation) since they could have the same metadata.

That said, we must find a way to tag the error as internal; otherwise, the system would map the mistakes to a generic one.

Should we add a dimension that marks the error as internal only? By the way, I came across the need for the metadata level of visibility when considering GDPR.

This is purely from the spec perspective; in code, you could define two classes, InternalError and PublicError, which will enforce such dimensions to be correctly set up. The good part of having a spec is that such low-level SDKs could be done once and avoid mistakes around the topic.

Again, I do not see the point of having two specs, but I agree with the problem of exposing a given error incorrectly.

The links you shared (which are excellent links, thank you so much) mentioned multiple that I am talking about:

Manage exceptions in a centralized manner to avoid duplicated try/catch blocks in the code https://owasp.org/www-project-developer-guide/draft/design/web_app_checklist/handle_errors_and_exceptions/

That means that we never expose debug_info to the clients, and if we add the extra internal dimension, we could transform it into a generic INTERNAL error!

Let me know what you think.

I think the case for having two separate specs is that there are two audiences with very different requirements.

This is highlighted in the discussion on localization:

Should I always include localized_message field?

No, you should only include the localized_message field when the error is meant to be displayed to the end-user. Most likely the field will be added at the edge of the system, aka application layer, where you are aware of the end-user actor and the end-user language. It is anti-pattern to add the localized_message field at the core of the system, aka domain layer, where you are not aware of the end-user actor and the end-user language, unless you are building a system that is about translations.

In one representation, we have a best practice (include localized messaging for external clients), but also an anti-pattern in the other representation (we should not include localization inside the system).

A similar logic happens around the physical structure of the error for errors that leave the system:

You MUST use the application/universal-error+json mime type content-type HTTP header when sending the error over the wire as a JSON.

In my mind, this ignores other ways of sending JSON/JSON-like data that are not tied to "normal" http requests (WebSocket messages, SSEs, GSON/gRPC). So rather than specifying this kind of rule in a specification like this, it makes more sense to live on its own in a spec that can handle questions about standards around the structure and protocol-specific rules to follow.

Looking at the section for input/form validation, I have a similar concern, but input validation is much more nuanced. As a common example: for unauthenticated users, I would hope provide error details on structure only (type, length, missing, etc), otherwise sticking with generic errors to minimize different attack strategies by bad actors. Even for authenticated users, a request to a document owned by another account should not differentiate between 404 not 401.

The way I think about it, form/input validation is actually a refinement of handling type errors (or data structure errors) in general, and from what you have so far, the intent is that type errors should:

Show what the expected type was
Show what the received type was
Show where the mismatch(es) occurred (causes, in your example)

I don't think this is actually limited to external/UI interfaces, but is useful in all parts of a system where messages are passed/consumed.

I think this would be easier to review if we started at the top with a more value-driven purpose for the spec. You highlighted a lot of problems that we hope to solve, but I think if have clearer answers to a few questions, it will help guide our spec definition.

Some example questions (based on my reading of the spec so far):

If implemented, what are the main benefits to a common developer?
- Developer does not have to reinvent/learn a new error specification or error handling pattern
- Logs will contain consistent data between parts of a system and event between separate systems
- Developers can understand and debug a system more easily because
  - Logs have information about where in the system an error occurred (Domain)
  - Logs have information about what the error is (Code)
  - Logs show a unique ID for errors (useful in searching/tracing, especially alongside causes)
  - Logs show what the error is (human readable message)
  - Logs give insight into code-level details (stack traces, utilizing cause + detailed type mismatch details, etc)

What I would like to see, as a separate spec for errors that leave a system:

If implemented, what are the main benefits to a common developer?
- Developer reduces risk of sending sensitive data (this would be my prime motive for adopting)
- Errors have reasonable, localized meaning to an end user/system, ideally providing enough information for self correction ("Passwords must contain 12 characters....", "Field 'amount' MUST be formatted as a string of USD, like '$2002.99'")
- Errors do not reveal system details (data-store implementations, messaging patterns, even service providers)
- Errors minimize the risk of OWASP-identified attack patterns (sniffing for not-found vs unauthorized, as an example)
- External errors may reference or be referenced in system level errors via ID (maybe... the idea is that an external error should still be logged in a way to be searchable/connectable to system level errors by developers)

Even though its out of scope here, I've seen several standards on handling PII (personally identifiable information) that specify only logging IDs of objects which would contain the information. I have mixed feelings on that kind of approach, but from a data-deletion standpoint, it makes sense. I've also seen strategies that store encrypted values, and devs check-out the decryption key (which is rotated based on time). Again, out of scope, but putting here for kicks.

Aug 30 '24 10:08 dallashall