datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingest): add loading MSSQL Job as dataFlow, job step as dataJob …

Open RChygir opened this issue 2 years ago • 5 comments

…and stored procedures as dataJob

Co-Authored-By: Oleksandr [email protected]

Summary

We add loading dataFlow and dataJob entities. We add configuration flags for separated work with entities.

Changes

  • Add MSSQL Job as dataFlow;
  • Add MSSQL Job Step as dataJob with properties:
Property Example
command EXEC [DataBaseName].[SchemaName].[ProcedureName]
date_created 1990-01-01 00:00:00.000000
date_modified 1990-01-01 00:00:00.000000
description Full description
job_id 98765432-1987-6543-2198-7654321987654
job_name JobName
step_id 1
step_name StepName
subsystem TSQL
  • Add Stored Procedure as dataJob with properties;
Property Example
code CREATE PROCEDURE [SchemaName].[ProcedureName] AS BEGIN SET NOCOUNT ON; END
create_date 1990-01-01 00:00:00.000000
definition Full description
depending_on_procedure {}
input parameters []
modify_date 1990-01-01 00:00:00.000000
procedure_depends_on {'SchemaName.rc.TableName': 'USER_TABLE'}
  • Stored procedures pack into dataFlow for StoredProcedures;
  • Add configuration flags:
    • include_stored_procedures - for load stored procedures as dataJob;
    • include_code - for load stored procedure code as property;
    • include_jobs - for load MSSQL Jobs as dataFlow and Steps as dataJob;
    • include_descriptions - for load tables and table columns descriptions.

Checklist

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

RChygir avatar Jul 08 '22 10:07 RChygir

Unit Test Results (build & test)

621 tests  ±0   617 :heavy_check_mark: ±0   15m 46s :stopwatch: +5s 157 suites ±0       4 :zzz: ±0  157 files   ±0       0 :x: ±0 

Results for commit 8f113288. ± Comparison against base commit 817406ea.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 08 '22 17:07 github-actions[bot]

Unit Test Results (metadata ingestion)

       8 files  ±0         8 suites  ±0   58m 51s :stopwatch: + 1m 26s    759 tests ±0     753 :heavy_check_mark:  - 3  3 :zzz: ±0  3 :x: +3  1 520 runs  ±0  1 507 :heavy_check_mark:  - 6  7 :zzz: ±0  6 :x: +6 

For more details on these failures, see this check.

Results for commit 8f113288. ± Comparison against base commit 817406ea.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 08 '22 18:07 github-actions[bot]

@mayurinehate Please have a look. Let's aim to get this one in!

jjoyce0510 avatar Nov 16 '22 21:11 jjoyce0510

Updated source.

RChygir avatar Nov 17 '22 11:11 RChygir

Hey @RChygir thank you for the PR.

Mssql tests are failing in CI. Can you please fix it ?

Also it would be great if you can add more tests for MSSQL Jobs as Dataflow and MSSQL Job steps as DataJob.

mayurinehate avatar Nov 18 '22 15:11 mayurinehate

Hi again @RChygir !! did you chance to see my comment above ?

Let me know if you need any help or have any doubts.

mayurinehate avatar Dec 05 '22 16:12 mayurinehate

Hi folks! If we do not hear back on this PR soon, we will move to close due to inactivity.

Cheers John

jjoyce0510 avatar Dec 22 '22 22:12 jjoyce0510

I apologize for the delay in responding to your comments. I will try to correct my mistakes as soon as possible. @mayurinehate Thanks for your help

RChygir avatar Dec 23 '22 09:12 RChygir

@RChygir Is it possible to add integration tests related to jobs and procedures integration ? I plan to dwelve into MSSQL side of things next, Tests would greatly help. Existing integration tests are here - https://github.com/datahub-project/datahub/tree/master/metadata-ingestion/tests/integration/sql_server

mayurinehate avatar Jan 17 '23 07:01 mayurinehate

Tests updated.

RChygir avatar Jan 18 '23 10:01 RChygir

HI @RChygir

Are you able to take a look?

If not, I'll close to inactivity and we can reopen at a later date if it makes sense.

Cheers John

jjoyce0510 avatar Jan 24 '23 03:01 jjoyce0510

Thank you for addressing comments!

@mayurinehate Will be taking another look shortly.

jjoyce0510 avatar Jan 24 '23 19:01 jjoyce0510

@mayurinehate could you take another look at this PR?

hsheth2 avatar Mar 22 '23 21:03 hsheth2

@DmytroYurchuk There's a couple more lingering PR review comments - are you planning on tackling those?

hsheth2 avatar Jun 26 '23 23:06 hsheth2

@DmytroYurchuk There's a couple more lingering PR review comments - are you planning on tackling those? @hsheth2 Yes, I am. I`m planning to resolve those and merge this PR in one-two weeks.

DmytroYurchuk avatar Jun 27 '23 20:06 DmytroYurchuk

@mayurinehate can you take another look at this?

hsheth2 avatar Jul 19 '23 16:07 hsheth2

the original mssql.py file deleted

DmytroYurchuk avatar Jul 20 '23 12:07 DmytroYurchuk

@hsheth2 What do you think about PR?

DmytroYurchuk avatar Jul 27 '23 08:07 DmytroYurchuk

Can somebody explain more about failed tests?

DmytroYurchuk avatar Jul 27 '23 08:07 DmytroYurchuk

Two other small things

  • can we add documentation on what db permissions are required for these

Where can I find this documentation?

  • can we wrap some of the new logic in try excepts (with report.report_error) to make sure that it can't break the entire ingestion

It has been done.

https://github.com/datahub-project/datahub/pull/5363/files#diff-eaa65e030472556b356c66b3f1267bbb1b5213849cc829398ffa0a08d94098b4R280

https://github.com/datahub-project/datahub/pull/5363/files#diff-eaa65e030472556b356c66b3f1267bbb1b5213849cc829398ffa0a08d94098b4R300

DmytroYurchuk avatar Jul 31 '23 20:07 DmytroYurchuk

Can somebody help with these failed tests?

DmytroYurchuk avatar Aug 16 '23 15:08 DmytroYurchuk

cypress test failures look unrelated so merging this in.

anshbansal avatar Aug 24 '23 09:08 anshbansal