datahub
datahub copied to clipboard
Vertica timestamp(0) causes an unable to get column information error in DataHub CLI
Describe the bug
A warning is received in Datahub CLI while executing ingestion from Vertica DB via Vertica plugin.
Warning output: 'warnings': {'XXX_TABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}
It seems that the error is caused when an attribute data type is set to timestamp(0). The error can be bypassed when commenting out the following lines in venv\Lib\site-packages\datahub\ingestion\source\sql\vertica.py, _get_column_info
function.
elif attype in ("timestamp", "time"):
kwargs["timezone"] = False
# if charlen:
# kwargs["precision"] = int(charlen)
args = ()
The created kwargs object precision attribute isn't compatible with the timestamp parameters: {'timezone': False, 'precision': 0}
, it's possible that the same error would appear when working with timestampz data type.
Commenting most likely works as timestamp precision is optional - TIMESTAMP/TIMESTAMPTZ
To Reproduce Create a table in Vertica including an attribute with timestamp(0):
CREATE TABLE YYY.XXX_TABLE
(
load_timestamp timestamp(0)
);
Execute the pipeline using datahub CLI.
Expected behavior Load meta from Vertica to file using datahub CLI w/o warnings.
Desktop (please complete the following information):
- Python 3.8.2
- acryl-datahub 0.8.40 / DataHub CLI 0.8.40
- wheel 0.37.1
- setuptools 62.6.0
- pip 22.1.2
Hi, is there a possibility of receiving an ETA on when this might be resolved?
Some additional information regarding this issue.
In case of timestamp(x) or timestamptz(x) as mentioned, the following warning is raised:
'warnings': {'XSCHEMA.YTABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}
The precision argument is initiated in \Lib\site-packages\datahub\ingestion\source\sql\vertica.py:
elif attype in ("timestamptz", "timetz"):
kwargs["timezone"] = True
if charlen:
kwargs["precision"] = int(charlen) # type: ignore
args = () # type: ignore
elif attype in ("timestamp", "time"):
kwargs["timezone"] = False
if charlen:
kwargs["precision"] = int(charlen) # type: ignore
This calls for the TIMESTAMP class in \Lib\site-packages\sqlalchemy\sql\sqltypes.py:
class TIMESTAMP(DateTime):
"""The SQL TIMESTAMP type.
:class:`_types.TIMESTAMP` datatypes have support for timezone
storage on some backends, such as PostgreSQL and Oracle. Use the
:paramref:`~types.TIMESTAMP.timezone` argument in order to enable
"TIMESTAMP WITH TIMEZONE" for these backends.
"""
__visit_name__ = "TIMESTAMP"
def __init__(self, timezone=False):
"""Construct a new :class:`_types.TIMESTAMP`.
:param timezone: boolean. Indicates that the TIMESTAMP type should
enable timezone support, if available on the target database.
On a per-dialect basis is similar to "TIMESTAMP WITH TIMEZONE".
If the target database does not support timezones, this flag is
ignored.
"""
super(TIMESTAMP, self).__init__(timezone=timezone)
def get_dbapi_type(self, dbapi):
return dbapi.TIMESTAMP
Which returns the precision error as there's no precision argument defined.
So an easy fix would be to add the following argument and code line into the class:
class TIMESTAMP(DateTime):
"""The SQL TIMESTAMP type.
:class:`_types.TIMESTAMP` datatypes have support for timezone
storage on some backends, such as PostgreSQL and Oracle. Use the
:paramref:`~types.TIMESTAMP.timezone` argument in order to enable
"TIMESTAMP WITH TIMEZONE" for these backends.
"""
__visit_name__ = "TIMESTAMP"
def __init__(self, timezone=False, precision=None):
"""Construct a new :class:`_types.TIMESTAMP`.
:param timezone: boolean. Indicates that the TIMESTAMP type should
enable timezone support, if available on the target database.
On a per-dialect basis is similar to "TIMESTAMP WITH TIMEZONE".
If the target database does not support timezones, this flag is
ignored.
"""
super(TIMESTAMP, self).__init__(timezone=timezone)
self.precision = precision
def get_dbapi_type(self, dbapi):
return dbapi.TIMESTAMP
Which results in the following schema output:
{
"fieldPath": "DATE_CREATED",
"jsonPath": null,
"nullable": true,
"description": null,
"created": null,
"lastModified": null,
"type": {
"type": {
"com.linkedin.pegasus2avro.schema.TimeType": {}
}
},
"nativeDataType": "TIMESTAMP(precision=0)",
"recursive": false,
"globalTags": null,
"glossaryTerms": null,
"isPartOfKey": false,
"isPartitioningKey": null,
"jsonProps": null
}
Hi martinontcode! Would you mind opening a PR with the fix so we can move it through review?
This issue is stale because it has been open for 15 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
Hi, the issue is still present.
The complexity of this issue comes from the fact that the used import from sqlalchemy.sql import sqltypes
has the missing "precision" parameter.
Is there a way to fix this from Datahub perspective?
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
I submitted a PR attempting to fix this problem. It basically is a hacky implementation of the suggestion @martinontcode made.
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This still is a problem in the current version. Could anybody review and comment on the opened PR?