marquez
marquez copied to clipboard
Add extension point for lineage event validation
Some users need to be able to validate lineage events. The goal of this feature is to make it easy for them to add validation logic to incoming events and only accept valid ones. Proposal: add a mechanism to add a python HTTP proxy in front of OpenLineage ingestion POST HTTP endpoint
@mobuchowski Do you have a recommendation on adding a simple python proxy in front of the OL endpoint?
With this requirements I'd just write something based on very popular Python libraries - flask
and requests
. Something like that:
from flask import Flask
from requests import post, request
app = Flask(__name__)
MARQUEZ_URI = os.getenv('MARQUEZ_URI', 'https://marquez:80/api/v1/lineage')
def validate(event: dict) -> bool:
...
@app.route('/api/v1/lineage')
def proxy():
if validate(request.json):
return 200, post(f"{MARQUEZ_URI}").content
return b'', 400
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
@mobuchowski has a sound solution to standup a proxy in front of the Marquez HTTP API server (listening on POST
calls to /api/v1/lineage
). I wanted to provide a diagram (below) outlining the deployment on k8s
:
Note: I used ports
5005
for the proxy in the diagram above as an example.
- Define a regex rule on path
/api/v1/lineage
to route only POST HTTP requests to port5005
(the port the proxy is listening on) to apply validation on the OL event, then re-route the request to the Marquez HTTP API - Define a regex rule on path
/api
to route all other HTTP requests to port5000
(the port the Marquez HTTP server is listening on)
See: https://kubernetes.github.io/ingress-nginx/user-guide/ingress-path-matching/
@julienledem: should the result of this issue be a design doc outlining our recommendation to standup a proxy (and a couple alternatives) in front of Marquez?