marquez icon indicating copy to clipboard operation
marquez copied to clipboard

Add extension point for lineage event validation

Open julienledem opened this issue 2 years ago • 4 comments

Some users need to be able to validate lineage events. The goal of this feature is to make it easy for them to add validation logic to incoming events and only accept valid ones. Proposal: add a mechanism to add a python HTTP proxy in front of OpenLineage ingestion POST HTTP endpoint

julienledem avatar Jul 26 '22 00:07 julienledem

@mobuchowski Do you have a recommendation on adding a simple python proxy in front of the OL endpoint?

julienledem avatar Aug 03 '22 00:08 julienledem

With this requirements I'd just write something based on very popular Python libraries - flask and requests. Something like that:

from flask import Flask
from requests import post, request


app = Flask(__name__)
MARQUEZ_URI = os.getenv('MARQUEZ_URI', 'https://marquez:80/api/v1/lineage')


def validate(event: dict) -> bool:
    ...


@app.route('/api/v1/lineage')
def proxy():
    if validate(request.json):
        return 200, post(f"{MARQUEZ_URI}").content
    return b'', 400


if __name__ == '__main__':
  app.run(host='0.0.0.0', port=8080)

mobuchowski avatar Aug 03 '22 11:08 mobuchowski

@mobuchowski has a sound solution to standup a proxy in front of the Marquez HTTP API server (listening on POST calls to /api/v1/lineage). I wanted to provide a diagram (below) outlining the deployment on k8s:

Marquez with Proxy

Note: I used ports 5005 for the proxy in the diagram above as an example.

  • Define a regex rule on path /api/v1/lineage to route only POST HTTP requests to port 5005 (the port the proxy is listening on) to apply validation on the OL event, then re-route the request to the Marquez HTTP API
  • Define a regex rule on path /api to route all other HTTP requests to port 5000 (the port the Marquez HTTP server is listening on)

See: https://kubernetes.github.io/ingress-nginx/user-guide/ingress-path-matching/

wslulciuc avatar Aug 03 '22 22:08 wslulciuc

@julienledem: should the result of this issue be a design doc outlining our recommendation to standup a proxy (and a couple alternatives) in front of Marquez?

wslulciuc avatar Aug 03 '22 22:08 wslulciuc