Exception Management in the Topology
We usually don't want to stop the workflow from running when encountering individual errors.
One possible approach would be to manage exceptions directly on the Topology.
Each source should throw two types of exceptions (please improve the names):
- RuntimeException
- WorkflowException
If the topology finds a runtime exception, it will continue. If it finds a workflow exception, it will stop everything.
This way we would be certain that sources run e2e and we have a centralized place to handle how to respond to each case
The primary purpose of this task must improve the user experience when displaying errors. After analyzing the current status, we notice that:
- How we handle parsing errors of the workflow configuration could be improved.
- Pydantic errors are not clear. There is no guidance, or information, for the user when it fails due to missing/extra parameters. The same occurs with malformed YAML files.
- Workflow can have different behaviors depending if we are running them from CLI or scheduler. For example, how we print the status or set the logging level.
- We need to abstract the workflow class
- Each topology node is divided into three main parts: producer, stages, and post-processing:
- Producer: any errors thrown here must stop the topology execution, so we have to throw a
WorkflowException. - Stages/Post-processing: any error here doesn't affect the topology execution since the main purpose is the extraction of different entities to be ingested/processed. In this case, we throw an
EntityIngestionException- These exceptions must be handled and logged into a single place instead of being done multiple times in different parts of the code
- Producer: any errors thrown here must stop the topology execution, so we have to throw a
The list of tasks to be done are:
- [x] Improve errors displayed in CLI when workflow parsing fails https://github.com/open-metadata/OpenMetadata/pull/7522
- [ ] Abstract workflow class
- [ ] Handle exceptions from the topology runner
- [ ] Review exceptions for each supported service:
- [ ] Database
- [ ] Dashboard
- [ ] Pipeline
- [ ] MlModel
- [ ] Messaging
- [ ] Review exceptions for profiler workflow
- [ ] Review exceptions for test suite workflow
@pmbrull @nahuelverdugo lets begin the design/thinking process on how do we want to handle this but actual delivery should be in 0.13
The goal of this task is the creation of an EPIC that will have detailed information and broken down tasks to handle the improvements. Thanks @nahuelverdugo for handling this
I have updated the task description with the list of things that must be done
@pmbrull @nahuelverdugo updated the release to 0.13
Pending:
- [x] Profiler workflow should be created from BaseWorkflow
- [x] Data Insights and Data Quality workflows should be created from BaseWorkflow
- [x] Status handling needs to be updated for the workflows above after the update
- [x] Clean the legacy Sink, Processor, and BulkSink classes in favor of using steps.*
- [x] Re-implement the test_workflow_output_handler with the new status structure
closed by https://github.com/open-metadata/OpenMetadata/pull/13471