seastar
seastar copied to clipboard
Exception printout is too verbose
In the Scylla project we often print exceptions using Seastar's logging. Here is an example error message, where the entire text in bold is one exception:
ERROR 2022-03-21 14:33:13,657 [shard 0] init - Startup failed: std::_Nested_exceptionstd::runtime_error (Failed to set up Alternator TLS credentials): std::_Nested_exceptionstd::runtime_error (Could not read certificate file conf/scylla.crt): std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: open failed: No such file or directory [conf/scylla.crt])
This is ugly, and I want to suggest that maybe most of the type information is redundant and not helpful to end-users. Users of the application don't know what to make of internal C++ type names like std::_nested_exception<std::runtime_error>
. In this case simple punctuation like a colon could replace the entire "nested exception" song and dance (see example below). Even std::filesystem::__cxx11::filesystem_error is not helpful, and arguably so is the sentence "error system:2" that we go out of our way to print (in log.cc
, std::ostream& operator<<(std::ostream& out, const std::exception_ptr& eptr)
) is also not helpful. Rather, the what()
of this error would have been enough.
In the above example, I would argue that the following shorter version of the same message would be much easier to understand:
ERROR 2022-03-21 14:33:13,657 [shard 0] init - Startup failed: Failed to set up Alternator TLS credentials: Could not read certificate file conf/scylla.crt: open failed: No such file or directory [conf/scylla.crt]
Perhaps for some rare types of exceptions it can be helpful to print their name, but arguably for types which implement a what()
(like runtime_error and filesystem_error), only that should be printed and not the exception's type. But getting rid of the "nested exception" text is probably the first thing we should eliminate.
I agree that the _nested
stuff is redundant but I think we should keep the whole hierarchy as it gives us the context of cause
and effect. Maybe we should only write the template type of nested and make it look something like the following:
runtime_error (Failed to set up Alternator TLS credentials) <-- runtime_error (Could not read certificate file conf/scylla.crt) <-- std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: open failed: No such file or directory [conf/scylla.crt])
I agree that the
_nested
stuff is redundant but I think we should keep the whole hierarchy as it gives us the context of cause and effect. Maybe we should only write the template type of nested and make it look something like the following:runtime_error (Failed to set up Alternator TLS credentials) <-- runtime_error (Could not read certificate file conf/scylla.crt) <-- std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: open failed: No such file or directory [conf/scylla.crt])
If you were a user of some software, not one of its developers, would you consider the string "std::filesystem::__cxx11::filesystem_error" helpful in an error message?, or give you any sort of context? I doubt it. Would you consider the string "runtime_error" to helpful in what is quite obviously an error at runtime? :-)
I admit that in some cases, some of these type names may help us, developers, because if you see the message "No such file or directory" maybe you don't know where it came from and the type name gives you one small clue on what to grep for. But I argue that 1. this information isn't helpful enough anyway (it doesn't say where the exception came from, just one small detail of its type), and 2. if we want to add more information about where an exception came from we should add it in English text, not C++ type names (and in fact the "nested exception" thing is exactly this).
If you were a user of some software, not one of its developers, would you consider the string "std::filesystem::__cxx11::filesystem_error" helpful in an error message?, or give you any sort of context I doubt it. Would you consider the string "runtime_error" to helpful in what is quite obviously an error at runtime? :-)
I agree that this error is for developers. However, this should be a last resort when you haven't add a dedicated text that explains in plain english what wnet wrong. Your argument holds true also for backtraces (comming from coredumps / stalls) nevertheless it is a fact that if we didn't have those things we wouldn't have been able to find any bugs filed by users without reproducing them first. We could tell the user - "Stall detected!!!" and thats it but how would that have helped?
I do agree that we should aspire to minimize the occurrences of such prints and if we can detect the error in advance and issue a plain english message it is better. But there are simply cases that you haven't anticipated. So in a sense those prints are for developers and users should open bugs quoting them so the developers can act on them.
I this particular case by the way, we should have detected already in the configuration reading state that we were provided with a non exiting certificate file. And yes - instead of providing to the TLS stack an invalid argument which resulted in this exception, just notify the user. The bottom line, those exception prints are there for two reasons:
- As a fail safe to when we haven't anticipated something that we could or couldn't log in a friendlier way.
- As a logging information for giving developers hints (by the user) when something slips through and gets to be printed that way. Makes sense?
One more note, it is true that blindly printing those types can result in a ridiculous verbosity, however, when this is the only lead a developer has for some bug, IMHO over verbosity is better than under verbosity.