dbt-databricks
dbt-databricks copied to clipboard
Cannot run dbt docs generate with JSON logs
Describe the bug
When running dbt docs generate with JSON logs enabled I receive an error: Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable
.
This occurs when using dbt-databricks 1.1.0 on all of locally (Windows), Docker (Linux) and the preview Databricks dbt task type.
It does not occur in earlier versions.
It does not occur with the default log format.
Steps To Reproduce
- Run
dbt --log-format json docs generate
Expected behavior
Doc site generates correctly, with JSON logs.
Screenshots and log output
{"code": "E044", "data": {}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "info", "log_version": 2, "msg": "Building catalog", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.779873Z", "type": "log_line"}
{"code": "Z046", "data": {"log_fmt": null, "msg": "Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable"}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "warn", "log_version": 2, "msg": "Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.783320Z", "type": "log_line"}
{"code": "E041", "data": {"num_exceptions": 1}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "error", "log_version": 2, "msg": "dbt encountered 1 failure while writing the catalog", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.792754Z", "type": "log_line"}
System information
Windows:
Core:
- installed: 1.1.0
- latest: 1.1.1 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- databricks: 1.1.0 - Up to date!
- spark: 1.1.0 - Up to date!
The operating system you're using: Windows/Linux/Databricks
The output of python --version
:
Windows:
Python 3.9.0
Additional context
@ueshin @allisonwang-db I can repro this on my laptop, too.
@jtcohen6 Could you help take a look at this issue?
Seems like SparkRelation
or even BaseRelation
are not serializable.
>>> json.dumps(SparkRelation.create(schema='a', identifier='b'))
...
TypeError: Object of type SparkRelation is not JSON serializable
>>> json.dumps(BaseRelation.create(schema='a', identifier='b'))
Traceback (most recent call last):
...
TypeError: Object of type BaseRelation is not JSON serializable
Thanks.
Seems like it tries to show more logs in 1.1
than 1.0
, that is breaking the command.
I'm able to reproduce this locally with the latest dbt-databricks
+ dbt-core
. I'm trying to figure out where in the methods called by docs generate
a DatabricksRelation
is being passed directly into a log message, and then formatted into JSON, without any intermediate serialization steps.
BaseRelation
(and thereby SparkRelation
+ DatabricksRelation
) are not directly JSON serializable, but they can be converted to dictionaries that are, via the to_dict()
method of the parent class dbtClassMixin
:
>>> import json
>>> from dbt.adapters.base import BaseRelation
>>> json.dumps(BaseRelation.create(schema='a', identifier='b').to_dict())
'{"path": {"database": null, "schema": "a", "identifier": "b"}, "type": null, "quote_character": "\\"", "include_policy": {"database": true, "schema": true, "identifier": true}, "quote_policy": {"database": true, "schema": true, "identifier": true}, "dbt_created": false}'
I found a simple fix for this, but I'd be curious to get @nathaniel-may's take on it before merging
@jtcohen6 any update on this? I'd love to close this out in the next point release.
@bilalaslamseattle Thanks for flagging this again.
We've seen multiple issues in this category, across multiple adapters, and we think there exists a general-purpose solution that will be the right move longer-term: https://github.com/dbt-labs/dbt-core/issues/5436
The work for that is definitely on our radar. If it appears that the general-purpose resolution will be too complex, we can put a one-off patch for this in dbt-spark
, to unblock the user here.
(cc @nathaniel-may)
Hiya people on the thread.
Per this core PR, this JSON serialization bug should be solved across all adapters. There's a lot of layers of indirection in the logger call stack, so finding the root cause of this error took us some concerted time. I also threw on our PRs backlog tags, so in theory, you should be able to "seamlessly" integrate the fix into your env on the new release. (it's also live in main)
I'd love to close this if (🤞) people report things working here.