datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Undocumented breaking change in Ownership model of 0.13.0

Open nclaeys opened this issue 10 months ago • 2 comments

datahub issue:

Describe the bug In the latest release 0.13.0, the property ownerTypes was added to the Ownership object, which is not documented as a breaking change. If you use an updated client, then it cannot ingest new aspects until the gms is updated to the same version. This is because the gms rejects the schema as it does not recognise the ownerTypes property

To Reproduce Steps to reproduce the behavior:

  1. Install gms 0.12.1
  2. use the acryl-datahub-airflow-plugin 0.13.0 to ingest lineage information
  3. ingest the lineage and the emitter will fail to send the metadata to datahub
  4. See error
[2024-03-25, 00:18:32 UTC] {datahub_plugin_v22.py:88} ERROR - Error sending metadata to datahub: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 285, in _emit_generic
    response.raise_for_status()
  File "/home/airflow/.local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://datahub.rainman.dataminded.cloud/api/gms/aspects?action=ingestProposal
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 223, in emit
    self.emit_mcp(item)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 264, in emit_mcp
    self._emit_generic(url, payload)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 293, in _emit_generic
    raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})
[2024-03-25, 00:18:32 UTC] {logging_mixin.py:188} INFO - Exception: Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 285, in _emit_generic
    response.raise_for_status()
  File "/home/airflow/.local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://datahub.rainman.dataminded.cloud/api/gms/aspects?action=ingestProposal
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub_airflow_plugin/datahub_plugin_v22.py", line 262, in custom_on_success_callback
    datahub_task_status_callback(context, status=InstanceRunResult.SUCCESS)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub_airflow_plugin/datahub_plugin_v22.py", line 115, in datahub_task_status_callback
    dataflow.emit(emitter, callback=_make_emit_callback(task.log))
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/api/entities/datajob/dataflow.py", line 171, in emit
    emitter.emit(mcp, callback)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 223, in emit
    self.emit_mcp(item)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 264, in emit_mcp
    self._emit_generic(url, payload)
  File "/home/airflow/.local/lib/python3.11/site-packages/datahub/emitter/rest_emitter.py", line 293, in _emit_generic
    raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\n', 'status': 422})

The change was introduced in this commit: https://github.com/datahub-project/datahub/commit/ed10a8d8cca3b17e982db6d14ea435833c5a87ea#diff-d08f131b5220b63a5f3ce2e254f76e9ce0de6ac14f00cae2be14b553d0e9a7a4

Expected behavior Document that this is a breaking change such that people make sure to check. Now we had customers updating their client version without knowing that it would break the metadata ingestion .

Screenshots

Metadata rejected call to datahub:

curl -X POST -H 'User-Agent: python-requests/2.31.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: <redacted>' --data '{"proposal": {"entityType": "dataFlow", "entityUrn": "urn:li:dataFlow:(airflow,airflow-task,prod)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:SomeUser\", \"type\": \"DEVELOPER\", \"source\": {\"type\": \"SERVICE\"}}], \"ownerTypes\": {}, \"lastModified\": {\"time\": 0, \"actor\": \"urn:li:corpuser:airflow\"}}", "contentType": "application/json"}}}' 'https://<url>/api/gms/aspects?action=ingestProposal'

In version 0.12.1.5 of the acryl-datahub-airflow-plugin metadata update succeeds:

curl -X POST -H 'User-Agent: python-requests/2.31.0' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: <redacted>' --data '{"proposal": {"entityType": "dataJob", "entityUrn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,airflow-task,prod),ingest-weather-mx_nano)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:SomeUser\", \"type\": \"DEVELOPER\", \"source\": {\"type\": \"SERVICE\"}}], \"lastModified\": {\"time\": 0, \"actor\": \"urn:li:corpuser:airflow\"}}", "contentType": "application/json"}}}' 'https://<url>/api/gms/aspects?action=ingestProposal'

Desktop (please complete the following information):

  • OS: ubuntu
  • Browser /
  • Version 0.12.0

Additional context Add any other context about the problem here.

nclaeys avatar Mar 26 '24 08:03 nclaeys

We are facing the same issue, so i guess upgrade gms to 0.13 will solve it?

gabrielwry avatar Apr 09 '24 15:04 gabrielwry

We're encountering the same issue. has anyone found a solution?

younesidhamou avatar May 03 '24 14:05 younesidhamou

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Jun 03 '24 01:06 github-actions[bot]

The client and server versions are not in alignment. It looks like the cli is newer and when it sends the default empty ownerTypes map to the server it is rejected. There are two options, the first is to downgrade the client and the second is to update the service. Another later improvement was made on the server side to ignore unknown properties which when active would drop a client providing the ownerTypes until the service side was upgraded.

david-leifker avatar Sep 23 '24 18:09 david-leifker