problem Ability to set Cause (any Throwable) but not serialize it

Detailed Description

We want to have problem's cause only for the logging purposes. It could expose some details that API must not be aware of (like SQL query used or some internal state).

There are 2 problems currently to implement this:

ThrowableProblem's cause cannot be any Throwable. It must be an instance of ThrowableProblem. But usually the root cause of our problem is some business exception that includes some additional fields and custom message (that includes these fields), suitable for logging, but not for end-user.
cause is always serialized. AdviceTrait.isCausalChainsEnabled() method controls only setting cause of the problem. But if problem intentionally contains a cause for the logging purposes, this cause can't be excluded.

Context

This is important because we want our logs to include important information for debugging purposes in the case when DEBUG log level is active. We plan to override AdviceTrait.log(...) (it's a Spring app) to be able to log full stacktrace even in the case of 4xx error (if DEBUG or TRACE is active). The full stacktrace will include caused by, which will include more detailed technical message. At the same time we don't want to expose possibly secret and technical details to the end-user.

We plan to return business exceptions from our domain layer of microservice. These business exceptions (extends RuntimeException, but doesn't extend/implement any Zalando classes/interfaces) will contain full details, but are not intended to be serialized and returned from API AS-IS. Instead we want to create another set of business exceptions (zalando problems), and they will include "root" business exceptions as cause (only for the logging/debugging purposes).

Maybe we are approaching the problem from the wrong end... The alternative approach is to make business exceptions (thrown from domain layer) to be Zalando Problems (and, respectively, be JSON-serializable). But this does mean that they will not include any details (not intended for end-user, but only for logging) in their message. Of course, we could include detailed message as another field into these classes and override getMessage() for the logging purposes to include both user-faced and technical message.

But I there are other cases.

We want to return different HTTP Status for the same Exception depending on the business operation that was trying to be done. E.g. UserNotFoundProblem. If it was getUserProfile operation, then it's reasonably to return 404. But if the same exception was thrown when addPhotoToUserGallery operation (if someone tries to add photo to the gallery of non-existing user), then 400 is more appropriate I think.
Domain method, that executes operation can (and should) know nothing about interface of API, that was used to call this domain method. It doesn't know field names, JSON structure of request etc. So, it's hard to create readable and easily understandable error messages for e.g. "validation" errors (I speak here not about simple bean-validation, but about some more sophisticated validation, that includes reading current data from DB etc, like e.g. trying to upload a new photo when count limit exceeded). It's also possible that one business exception will be thrown from two different API methods with different shapes of JSON-request, so we can't just hardcode field names into exception message.
Some domain operation can be part of more than one business-operation. For example, some domain-method (let's say it would be createThumbnail(File image)) can throw InvalidImageException (or InvalidImageProblem) when image argument doesn't contain a valid image. But in some cases it can be image, that comes from API request (and then this should lead to JSON response with "invalid image error"), but in other cases it can be image, that comes from DB (and then this should lead to JSON response with "internal server error"). Domain method can't and shouldn't know where this image come from, so it can't throw *Problem here because it doesn't know what problem should it select. So, we inevitably need 2 layers of business-exceptions: domain-level (generic) and API-level (zalando)?

Can I ask, if there is any intended way to use Zalando Problem library in microservice? Like, should we have a separate business exceptions without any relation to Zalando Problem and throw them from domain layer, and translate them to Zalando Problems only in the "controller/service" layer? Or should we preferably have only one set of business exceptions (that extends from AbstractThrowableProblem or implements Exceptional), and throw them from domain layer as is? Maybe we just "fight the framework" here.

Background: I speak about Spring application here and use problem-spring-web library too.

Sep 07 '20 19:09 xak2000

Like, should we have a separate business exceptions without any relation to Zalando Problem and throw them from domain layer, and translate them to Zalando Problems only in the "controller/service" layer?

That's at least the use case I had in mind when designing it and that is fact the preferred mode of operation for microservices at Zalando.

Sep 07 '20 20:09 whiskeysierra

That's at least the use case I had in mind when designing it and that is fact the preferred mode of operation for microservices at Zalando.

Then, the problem with logging arises.

It's hard to come up with some central place to log all handled business exceptions (with details and trace) if Problem loses it's relationship to business exception, that caused it.

Maybe it's only inherent to Spring's exception handling approach, I don't know. But at least in Spring (and traits implementation of problem-spring-web library), at the time of central exception logging, there is no trail of business exception, that caused currently handled Problem.

So, if we don't pass the business exception as cause of Problem, we unable to log it. Even more: it should be cause (and not e.g. separate field) to make SLF4J automatically log it's message and trace (after caused by) together with Problem.

Or... we can log only the cause (business exception), totally ignoring the Problem (pass business exception as the last argument of SLF4J method). This can work, as in this case there is no need to put it into cause field of Throwable (it could be any field of Problem) and business exception usually already contains more detailed exception and more useful stacktrace, than Problem, created in the controller/exception-handler. But, maybe in this case it would be harder to correlate messages in the log with messages in the response while debugging, IDK.

@whiskeysierra Do you use Spring and central logging of business exceptions at Zalando? Any advice can help, if it possible. :)

Thanks in advance!

Sep 07 '20 20:09 xak2000

Do you use Spring and central logging of business exceptions at Zalando?

Yes we do. We have a global catch block around our domain/business logic where we catch and log our business exceptions because:

We have the full context and causal chain available.
We know that the layers above the domain (API, Tests, UI, etc), by design can't handle those exceptions. They merely translate based on their individual needs (API translate to problems, UI to popups or whatever and Tests to failures).

There are two things to be taken from this:

We effectively use hexagonal architecture in our services and that allows us the clear boundary between domain and the rest.
We don't really make use of the logging feature within problem-spring-web, because of: a. We would log the exception twice, usually. b. We are missing information (as you correctly pointed out).

Sep 09 '20 23:09 whiskeysierra

Does that work for you?

Sep 10 '20 00:09 whiskeysierra

We have a global catch block around our domain/business logic where we catch and log our business exceptions

Just single global catch block for each microservice or single block for each domain/business method execution?

I just can't imagine: at what point in the Spring stack that "single global catch block" can live. The only way I can think of is to override ExceptionHandlerExceptionResolver and implement it's resolveException method to log the exception and then delegate it's handling to the standard ExceptionHandlerExceptionResolver. But this way it would be logged only if business-method was called by API layer, so will not work as you described (hexagonal architecture).

The other possible way I can think of is maybe AOP.

Can you clarify slightly more? If this is not something you can talk about, then it's ok. You already helped much.

I think, I already found a solution to the original problem: overriding ExceptionHandlerExceptionResolver and logging exception before handling it by @ExceptionHandler and before translating it to Problem should work for us. This way an original business exception can be logged and logging feature within problem-spring-web can be disabled/overriden to not log double time a single case.

But you mentioned hexagonal architecture and my solution will not work, as it is closely related to API layer. If I would want to add UI layer for example, I'll need to create another solution to log business exceptions, when there is no ExceptionHandlerExceptionResolver in the equation. So your words made me think... :)

Sep 10 '20 18:09 xak2000

I just can't imagine: at what point in the Spring stack that "single global catch block" can live.

In a properly designed hexagonal architecture, spring is not playing a significant role, if any, within the domain/business logic. Maybe apart from providing a dependency injection framework.

The other possible way I can think of is maybe AOP.

That's it, in a nutshell, yes. We have a dedicated GlobalCatchBlock class which contains all the rules about log level and format based on exception type. E.g., business exceptions are logged as warnings without stack traces (since they are expected in our case and not worth raising our exception count which is based on error logs). Any other exceptions is logged as an error with stack traces. Obviously, those rules are specific to us. Other teams/companies will have their own.

We are feeding this class primarily via an AOP aspect, but sometimes also directly.

But you mentioned hexagonal architecture and my solution will not work, as it is closely related to API layer. If I would want to add UI layer for example, I'll need to create another solution to log business exceptions, when there is no ExceptionHandlerExceptionResolver in the equation.

That's why we tried not to bind us to something Spring Web specific here. Our domain logic has adapters for numerous integrations: Spring Web REST APIs, Apache CXF SOAP APIs (legacy, but still in use, iirc), asynchronous message queues/systems, Batch-Job and background processing. All of those form entry (and exit) points into our system where we want to control the boundary, which includes error handling.

Sep 11 '20 01:09 whiskeysierra

@whiskeysierra Thanks you very much for detailed response!

That makes sense! I understand the benefits but was struggle with technical solution to this GlobalCatchBlock implementation. You confirmed that AOP is viable solution, so I will try to implement it using AOP. For example, some annotation on business classes/methods will trigger global catching/logging.

Thank you!

I think that this issue is not relevant then. The requirement mentioned in the title is only arises when business/domain logic throws Zalando exceptions (Problems), and that's not a very good decision to do it (as we found out).

I think this issue can be closed.

The confusing place in the documentation is here:

If you already have an exception class that you want to extend, you should implement the "marker" interface Exceptional:
public final class OutOfStockProblem extends BusinessException implements Exceptional

In this example BusinessException becomes Problem, so looks like the approach of using Zalando problems as business exceptions is the recommended way. But after this discussion I understand, that the recommended way is to separate business-exceptions and problems and to throw problems only from API layer, but not from business/domain layer.

Maybe it is worth to be mentioned in the documentation. I understand that it's decision of concrete teams, but as you mentioned early:

That's at least the use case I had in mind when designing it and that is fact the preferred mode of operation for microservices at Zalando.

So, if someone will try to use this library "in a wrong way", then he will inevitably run into problems.

Sep 12 '20 10:09 xak2000

Agreed, the wording is far from perfect. Interested in drafting a PR to improve that?

Sep 22 '20 19:09 whiskeysierra

I want to help to improve this, of course. But I'm sure you noticed that english is not my native language and I'm very bad at it, especially grammar. :) So it's hard for me to write documentation.

The other problem is that I'm unsure what to write and at what section of documentation to make it clear how this library is intended to be used. Maybe it should be done in a new section with a title like "Intended usage", IDK...

The only idea I have in mind right now is to replace BusinessException in the mentioned place of the docs with e.g. ApiException. At least it will be less confusing, I think. But I feel that this still is not enough. It should be explicitly mentioned that you should not try to throw Problems from your business-layer, only from API layer. But then, it sounds like a dogma, while it's actually a recommendation, as this is the approach that author of this library (you) have in mind when created it.

Sep 22 '20 22:09 xak2000