datahub
datahub copied to clipboard
feat(ingest): add loading MSSQL Job as dataFlow, job step as dataJob …
…and stored procedures as dataJob
Co-Authored-By: Oleksandr [email protected]
Summary
We add loading dataFlow and dataJob entities. We add configuration flags for separated work with entities.
Changes
- Add MSSQL Job as dataFlow;
- Add MSSQL Job Step as dataJob with properties:
Property | Example |
---|---|
command | EXEC [DataBaseName].[SchemaName].[ProcedureName] |
date_created | 1990-01-01 00:00:00.000000 |
date_modified | 1990-01-01 00:00:00.000000 |
description | Full description |
job_id | 98765432-1987-6543-2198-7654321987654 |
job_name | JobName |
step_id | 1 |
step_name | StepName |
subsystem | TSQL |
- Add Stored Procedure as dataJob with properties;
Property | Example |
---|---|
code | CREATE PROCEDURE [SchemaName].[ProcedureName] AS BEGIN SET NOCOUNT ON; END |
create_date | 1990-01-01 00:00:00.000000 |
definition | Full description |
depending_on_procedure | {} |
input parameters | [] |
modify_date | 1990-01-01 00:00:00.000000 |
procedure_depends_on | {'SchemaName.rc.TableName': 'USER_TABLE'} |
- Stored procedures pack into dataFlow for StoredProcedures;
- Add configuration flags:
- include_stored_procedures - for load stored procedures as dataJob;
- include_code - for load stored procedure code as property;
- include_jobs - for load MSSQL Jobs as dataFlow and Steps as dataJob;
- include_descriptions - for load tables and table columns descriptions.
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Unit Test Results (build & test)
621 tests ±0 617 :heavy_check_mark: ±0 15m 46s :stopwatch: +5s 157 suites ±0 4 :zzz: ±0 157 files ±0 0 :x: ±0
Results for commit 8f113288. ± Comparison against base commit 817406ea.
:recycle: This comment has been updated with latest results.
Unit Test Results (metadata ingestion)
8 files ±0 8 suites ±0 58m 51s :stopwatch: + 1m 26s 759 tests ±0 753 :heavy_check_mark: - 3 3 :zzz: ±0 3 :x: +3 1 520 runs ±0 1 507 :heavy_check_mark: - 6 7 :zzz: ±0 6 :x: +6
For more details on these failures, see this check.
Results for commit 8f113288. ± Comparison against base commit 817406ea.
:recycle: This comment has been updated with latest results.
@mayurinehate Please have a look. Let's aim to get this one in!
Updated source.
Hey @RChygir thank you for the PR.
Mssql tests are failing in CI. Can you please fix it ?
Also it would be great if you can add more tests for MSSQL Jobs as Dataflow and MSSQL Job steps as DataJob.
Hi again @RChygir !! did you chance to see my comment above ?
Let me know if you need any help or have any doubts.
Hi folks! If we do not hear back on this PR soon, we will move to close due to inactivity.
Cheers John
I apologize for the delay in responding to your comments. I will try to correct my mistakes as soon as possible. @mayurinehate Thanks for your help
@RChygir Is it possible to add integration tests related to jobs and procedures integration ? I plan to dwelve into MSSQL side of things next, Tests would greatly help. Existing integration tests are here - https://github.com/datahub-project/datahub/tree/master/metadata-ingestion/tests/integration/sql_server
Tests updated.
HI @RChygir
Are you able to take a look?
If not, I'll close to inactivity and we can reopen at a later date if it makes sense.
Cheers John
Thank you for addressing comments!
@mayurinehate Will be taking another look shortly.
@mayurinehate could you take another look at this PR?
@DmytroYurchuk There's a couple more lingering PR review comments - are you planning on tackling those?
@DmytroYurchuk There's a couple more lingering PR review comments - are you planning on tackling those? @hsheth2 Yes, I am. I`m planning to resolve those and merge this PR in one-two weeks.
@mayurinehate can you take another look at this?
the original mssql.py file deleted
@hsheth2 What do you think about PR?
Can somebody explain more about failed tests?
Two other small things
- can we add documentation on what db permissions are required for these
Where can I find this documentation?
- can we wrap some of the new logic in try excepts (with report.report_error) to make sure that it can't break the entire ingestion
It has been done.
https://github.com/datahub-project/datahub/pull/5363/files#diff-eaa65e030472556b356c66b3f1267bbb1b5213849cc829398ffa0a08d94098b4R280
https://github.com/datahub-project/datahub/pull/5363/files#diff-eaa65e030472556b356c66b3f1267bbb1b5213849cc829398ffa0a08d94098b4R300
Can somebody help with these failed tests?
cypress test failures look unrelated so merging this in.