hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Prototype Lineage Analysis Tooling

Open skrawcz opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. Currently, when given a Hamilton DAG, we don't expose ways to ask questions about it.

E.g. For GDPR, Data providence, etc.

E.g.

  1. What if I remove this input, what function(s) will I impact?
  2. What uses some PII data and what is the surface area?
  3. If someone requests to be forgotten, what data do I need to delete?
  4. Who should I talk to when I want to make this change that impacts these functions ? (e.g. use git blame to surface function owner?)
  5. What has changed about the DAG since these two commits?
  6. Are there any cycles?
  7. Are there clusters of disjoint nodes? If so, what are they, maybe I can delete them?
  8. etc

Describe the solution you'd like This could be a specific "driver class", or something added to the base driver.

Without an end user workflow in mind, it's a bit hard to specify the API.

Also, perhaps this would work well with #4 -- e.g. tagging what is PII, and what isn't?

Describe alternatives you've considered N/A

Additional context There are a lot of start ups and organizations trying to get a handle on their data and where it is used. Hamilton can help provide a way to get at this easily...

skrawcz avatar Feb 01 '22 00:02 skrawcz

OpenLineage looks exciting: https://openlineage.io/.

elijahbenizzy avatar Feb 22 '22 19:02 elijahbenizzy

Talked with the folks from selectstar last night -- might be an interesting potential integration: https://www.selectstar.com/

elijahbenizzy avatar Oct 29 '22 17:10 elijahbenizzy

We are moving repositories! Please see the new version of this issue at https://github.com/DAGWorks-Inc/hamilton/issues/15. Also, please give us a star/update any of your internal links.

elijahbenizzy avatar Feb 26 '23 17:02 elijahbenizzy