nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

[Nextclade Web] Reporting of errors and warnings is unclear

Open ivan-aksamentov opened this issue 2 years ago • 0 comments

  • Untranslated gene warnings are manifested in nextclade.error.csv, but not in main nextclade.tsv file.

  • nextclade.tsv file contains errors column, where alignment errors are manifested.

  • Nextclade currently does not have a clear hierarchy of various "bad things happened" events. Errors and warnings are called so rather arbitrarily and we've been renaming errors into warnings and vice versa in the past.

  • there is no clear connection between QC and errors. Is super bad QC score an error? A warning? a separate event? And vice versa: is an error or warning a QC event? Are they 2 partially intersecting sets of events?

  • Some serious failures are not signaled well enough. For example, frame shifts currently cause genes to not be translated and AA mutations to not appear for these genes, while sequence can be marked as "good" by QC. We hope to fix frame shits problems soon, but there are still similar problems with genes which may arise and need reporting.

All this makes it very confusing for users who try to find out what happened with their bad or incomplete sequences. QC is one of the main goals of Nextclade and I believe we need to clarify and standardize the errors and warnings, accumulate them, and emit in a single place.

Some of the technical challenges involved:

  • The errors and warnings scattered across different parts of the codebase

  • Some of these errors and warnings are the actual exceptions in the code, such that the normal program flow is interrupted (e.g. if alignment fails, no other steps can run), others are just the events we need to remember and keep around throughout the execution. There is no clear correspondence between severity of C++ exception/warning and the domain-specific interpretation of it.

  • WASM module has exception handling disabled, so exceptions cannot be used for error reporting in most of the algorithms part. Instead, return codes need to be used. This makes error handling very clunky, verbose and error-prone.

  • If the exception (or a bad return code) happen and the program flow is interrupted, due to current data layout it is not always straightforward how to gather both errored and succeeded results (sequences, rows) in a single array. Some refactoring might be needed.

ivan-aksamentov avatar Aug 27 '21 18:08 ivan-aksamentov