dolphinscheduler
dolphinscheduler copied to clipboard
[DSIP-79][Task] Add Datavines task to better support data quality
Search before asking
- [X] I had searched in the DSIP and found no similar DSIP.
Motivation
DataVines is an easy-to-use data quality service platform that supports multiple metric. https://github.com/datavane/datavines
- Datavines supports executing multiple metrics in one job.
- Datavines supports execution status dashboard and data quality report.
- Datavines supports plug-in extensions for components such as metric, data sources, error data storage, and execution engines.
- Jdbc engines can be used to execute data quality tasks instead of solely relying on Spark engines.
Design Detail
Sript mode
-
config data quality job in datavines
-
get the job config scipt file
-
Add datavines job node in workflow, and configure the script
API Mode
-
config data quality job in datavines
-
get the jobId
-
Add datavines job node in workflow, and configure the datavines api address and jobId
Compatibility, Deprecation, and Migration Plan
No response
Test Plan
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
It would be nice if you could submit a task here and see the status of the task in ds and stop it via datavines
very useful for data pipeLine
If the datavines are incorporated into the ds, it will be easier to integrate project management and data inspection
+1
You should provide a detail design related of the how to use the new task and how does the task work in ds, rather than some pictures of ui.
You should provide a detail design related of the how to use the new task and how does the task work in ds, rather than some pictures of ui.
ok, I will supplement the detail design.
ok, I will supplement the detail design.
Hi, are you still working on this?
ok, I will supplement the detail design.
Hi, are you still working on this?
I will come to do this.
Before the new task plugin is completed, shell tasks can be used to integrate datavines, refer to the following guidelines https://datavane.github.io/datavines-website/docs/integration/dolphin-scheduler
Since the Datavane can run tasks independently, why use DolphinScheduler at all? In the future, DolphinScheduler might serve only as a scheduler.