OpenMetadata Exception Management in the Topology

We usually don't want to stop the workflow from running when encountering individual errors.

One possible approach would be to manage exceptions directly on the Topology.

Each source should throw two types of exceptions (please improve the names):

RuntimeException
WorkflowException

If the topology finds a runtime exception, it will continue. If it finds a workflow exception, it will stop everything.

This way we would be certain that sources run e2e and we have a centralized place to handle how to respond to each case

The primary purpose of this task must improve the user experience when displaying errors. After analyzing the current status, we notice that:

How we handle parsing errors of the workflow configuration could be improved.
- Pydantic errors are not clear. There is no guidance, or information, for the user when it fails due to missing/extra parameters. The same occurs with malformed YAML files.
Workflow can have different behaviors depending if we are running them from CLI or scheduler. For example, how we print the status or set the logging level.
- We need to abstract the workflow class
Each topology node is divided into three main parts: producer, stages, and post-processing:
- Producer: any errors thrown here must stop the topology execution, so we have to throw a WorkflowException.
- Stages/Post-processing: any error here doesn't affect the topology execution since the main purpose is the extraction of different entities to be ingested/processed. In this case, we throw an EntityIngestionException
  - These exceptions must be handled and logged into a single place instead of being done multiple times in different parts of the code

The list of tasks to be done are:

[x] Improve errors displayed in CLI when workflow parsing fails https://github.com/open-metadata/OpenMetadata/pull/7522
[ ] Abstract workflow class
[ ] Handle exceptions from the topology runner
[ ] Review exceptions for each supported service:
- [ ] Database
- [ ] Dashboard
- [ ] Pipeline
- [ ] MlModel
- [ ] Messaging
[ ] Review exceptions for profiler workflow
[ ] Review exceptions for test suite workflow

Sep 06 '22 17:09 pmbrull

@pmbrull @nahuelverdugo lets begin the design/thinking process on how do we want to handle this but actual delivery should be in 0.13

Sep 12 '22 02:09 harshach

The goal of this task is the creation of an EPIC that will have detailed information and broken down tasks to handle the improvements. Thanks @nahuelverdugo for handling this

Sep 16 '22 05:09 pmbrull

I have updated the task description with the list of things that must be done

Sep 16 '22 07:09 nahuelverdugo

@pmbrull @nahuelverdugo updated the release to 0.13

Sep 16 '22 17:09 harshach

Pending:

[x] Profiler workflow should be created from BaseWorkflow
[x] Data Insights and Data Quality workflows should be created from BaseWorkflow
[x] Status handling needs to be updated for the workflows above after the update
[x] Clean the legacy Sink, Processor, and BulkSink classes in favor of using steps.*
[x] Re-implement the test_workflow_output_handler with the new status structure

Aug 29 '23 17:08 pmbrull

closed by https://github.com/open-metadata/OpenMetadata/pull/13471

Oct 09 '23 05:10 pmbrull