datahub
datahub copied to clipboard
Undocumented breaking change in Ownership model of 0.13.0
datahub issue:
Describe the bug In the latest release 0.13.0, the property ownerTypes was added to the Ownership object, which is not documented as a breaking change. If you use an updated client, then it cannot ingest new aspects until the gms is updated to the same version. This is because the gms rejects the schema as it does not recognise the ownerTypes property
To Reproduce Steps to reproduce the behavior:
- Install gms 0.12.1
- use the acryl-datahub-airflow-plugin 0.13.0 to ingest lineage information
- ingest the lineage and the emitter will fail to send the metadata to datahub
- See error
[2024-03-25, 00:18:32 UTC] {datahub_plugin_v22.py:88} ERROR - Error sending metadata to datahub: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 285, in _emit_generic
response.raise_for_status()
File "/home/airflow/.local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://datahub.rainman.dataminded.cloud/api/gms/aspects?action=ingestProposal
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 223, in emit
self.emit_mcp(item)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 264, in emit_mcp
self._emit_generic(url, payload)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 293, in _emit_generic
raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})
[2024-03-25, 00:18:32 UTC] {logging_mixin.py:188} INFO - Exception: Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 285, in _emit_generic
response.raise_for_status()
File "/home/airflow/.local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://datahub.rainman.dataminded.cloud/api/gms/aspects?action=ingestProposal
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/datahub_airflow_plugin/datahub_plugin_v22.py", line 262, in custom_on_success_callback
datahub_task_status_callback(context, status=InstanceRunResult.SUCCESS)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub_airflow_plugin/datahub_plugin_v22.py", line 115, in datahub_task_status_callback
dataflow.emit(emitter, callback=_make_emit_callback(task.log))
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/api/entities/datajob/dataflow.py", line 171, in emit
emitter.emit(mcp, callback)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 223, in emit
self.emit_mcp(item)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 264, in emit_mcp
self._emit_generic(url, payload)
File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 293, in _emit_generic
raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})
The change was introduced in this commit: https://github.com/datahub-project/datahub/commit/ed10a8d8cca3b17e982db6d14ea435833c5a87ea#diff-d08f131b5220b63a5f3ce2e254f76e9ce0de6ac14f00cae2be14b553d0e9a7a4
Expected behavior Document that this is a breaking change such that people make sure to check. Now we had customers updating their client version without knowing that it would break the metadata ingestion .
Screenshots
Metadata rejected call to datahub:
curl -X POST -H 'User-Agent: python-requests/2.31.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: <redacted>' --data '{"proposal": {"entityType": "dataFlow", "entityUrn": "urn:li:dataFlow:(airflow,airflow-task,prod)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:SomeUser\", \"type\": \"DEVELOPER\", \"source\": {\"type\": \"SERVICE\"}}], \"ownerTypes\": {}, \"lastModified\": {\"time\": 0, \"actor\": \"urn:li:corpuser:airflow\"}}", "contentType": "application/json"}}}' 'https://<url>/api/gms/aspects?action=ingestProposal'
In version 0.12.1.5 of the acryl-datahub-airflow-plugin metadata update succeeds:
curl -X POST -H 'User-Agent: python-requests/2.31.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: <redacted>' --data '{"proposal": {"entityType": "dataJob", "entityUrn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,airflow-task,prod),ingest-weather-mx_nano)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:SomeUser\", \"type\": \"DEVELOPER\", \"source\": {\"type\": \"SERVICE\"}}], \"lastModified\": {\"time\": 0, \"actor\": \"urn:li:corpuser:airflow\"}}", "contentType": "application/json"}}}' 'https://<url>/api/gms/aspects?action=ingestProposal'
Desktop (please complete the following information):
- OS: ubuntu
- Browser /
- Version 0.12.0
Additional context Add any other context about the problem here.
We are facing the same issue, so i guess upgrade gms to 0.13 will solve it?
We're encountering the same issue. has anyone found a solution?
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
The client and server versions are not in alignment. It looks like the cli is newer and when it sends the default empty ownerTypes map to the server it is rejected. There are two options, the first is to downgrade the client and the second is to update the service. Another later improvement was made on the server side to ignore unknown properties which when active would drop a client providing the ownerTypes until the service side was upgraded.