dbx
dbx copied to clipboard
Error logs on Databricks Runtime 11.3 LTS do not display correctly.
Expected Behavior
When looking at the Databricsk UI the error message with the stack trace is displayed in a clear way.
Current Behavior
When running the job on the Databricks 11.3 LTS Runtime the error message in the UI contains ANSI escape characters:
== SQL ==
this table doesn't exist
-----^^^
[0;31m---------------------------------------------------------------------------[0m
[0;31mParseException[0m Traceback (most recent call last)
[0;32m<command--1>[0m in [0;36m<cell line: 13>[0;34m()[0m
[1;32m 12[0m [0;34m[0m[0m
[1;32m 13[0m [0;32mwith[0m [0mopen[0m[0;34m([0m[0mfilename[0m[0;34m,[0m [0;34m"rb"[0m[0;34m)[0m [0;32mas[0m [0mf[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 14[0;31m [0mexec[0m[0;34m([0m[0mcompile[0m[0;34m([0m[0mf[0m[0;34m.[0m[0mread[0m[0;34m([0m[0;34m)[0m[0;34m,[0m [0mfilename[0m[0;34m,[0m [0;34m'exec'[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 15[0m [0;34m[0m[0m
[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36m<module>[0;34m[0m
[1;32m 7[0m [0;34m[0m[0m
[1;32m 8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 9[0;31m [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36mentrypoint[0;34m()[0m
[1;32m 4[0m [0;32mdef[0m [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m 5[0m [0mspark[0m [0;34m=[0m [0mSparkSession[0m[0;34m.[0m[0mbuilder[0m[0;34m.[0m[0mgetOrCreate[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 6[0;31m [0mspark[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0;34m"this table doesn't exist"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 7[0m [0;34m[0m[0m
[1;32m 8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/instrumentation_utils.py[0m in [0;36mwrapper[0;34m(*args, **kwargs)[0m
[1;32m 46[0m [0mstart[0m [0;34m=[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m 47[0m [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 48[0;31m [0mres[0m [0;34m=[0m [0mfunc[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 49[0m logger.log_success(
[1;32m 50[0m [0mmodule_name[0m[0;34m,[0m [0mclass_name[0m[0;34m,[0m [0mfunction_name[0m[0;34m,[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m [0;34m-[0m [0mstart[0m[0;34m,[0m [0msignature[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/sql/session.py[0m in [0;36mtable[0;34m(self, tableName)[0m
[1;32m 1138[0m [0;32mTrue[0m[0;34m[0m[0;34m[0m[0m
[1;32m 1139[0m """
[0;32m-> 1140[0;31m [0;32mreturn[0m [0mDataFrame[0m[0;34m([0m[0mself[0m[0;34m.[0m[0m_jsparkSession[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0mtableName[0m[0;34m)[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 1141[0m [0;34m[0m[0m
[1;32m 1142[0m [0;34m@[0m[0mproperty[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py[0m in [0;36m__call__[0;34m(self, *args)[0m
[1;32m 1319[0m [0;34m[0m[0m
[1;32m 1320[0m [0manswer[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mgateway_client[0m[0;34m.[0m[0msend_command[0m[0;34m([0m[0mcommand[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m-> 1321[0;31m return_value = get_return_value(
[0m[1;32m 1322[0m answer, self.gateway_client, self.target_id, self.name)
[1;32m 1323[0m [0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/sql/utils.py[0m in [0;36mdeco[0;34m(*a, **kw)[0m
[1;32m 200[0m [0;31m# Hide where the exception came from that shows a non-Pythonic[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m 201[0m [0;31m# JVM exception message.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 202[0;31m [0;32mraise[0m [0mconverted[0m [0;32mfrom[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 203[0m [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m 204[0m [0;32mraise[0m[0;34m[0m[0;34m[0m[0m
[0;31mParseException[0m:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'table'(line 1, pos 5)
== SQL ==
this table doesn't exist
-----^^^
Steps to Reproduce (for bugs)
- Create script which fails and configure it to run on 2 different Databricks Runtime versions: 11.3 LTS and 10.4 LTS.
- Run the
dbx deploycommand. - Execute the workflow using the UI.
- Observe how the error message is displayed.
Context
I've noticed that there is a problem when showing the error message in Databricks Runtime 11.3 LTS. In order to verify this here is an example setup:
Parts of the deployment file:
custom:
cluster-11-3: &cluster-11-3
new_cluster:
spark_version: "11.3.x-scala2.12"
num_workers: 1
node_type_id: "i3.xlarge"
aws_attributes:
...[REDACTED]...
cluster-10-4: &cluster-10-4
new_cluster:
spark_version: "10.4.x-scala2.12"
num_workers: 1
node_type_id: "i3.xlarge"
aws_attributes:
...[REDACTED]...
build:
no_build: true
environments:
default:
workflows:
- name: "run-python-task"
tasks:
- task_key: "run-11-3"
<<: *cluster-11-3
spark_python_task:
python_file: "file://cicd_sample_project/main.py"
parameters: []
- task_key: "run-10-4"
<<: *cluster-10-4
spark_python_task:
python_file: "file://cicd_sample_project/main.py"
parameters: []
Content of the cicd_sample_project/main.py file:
from pyspark.sql import SparkSession
def entrypoint():
spark = SparkSession.builder.getOrCreate()
spark.table("this table doesn't exist")
if __name__ == "__main__":
entrypoint()
setup.py file:
"""
This file configures the Python package with entrypoints used for future runs on Databricks.
Please follow the `entry_points` documentation for more details on how to configure the entrypoint:
* https://setuptools.pypa.io/en/latest/userguide/entry_point.html
"""
from setuptools import find_packages, setup
from cicd_sample_project import __version__
PACKAGE_REQUIREMENTS = ["pyyaml"]
# packages for local development and unit testing
# please note that these packages are already available in DBR, there is no need to install them on DBR.
LOCAL_REQUIREMENTS = [
"pyspark==3.2.1",
"delta-spark==1.1.0",
]
TEST_REQUIREMENTS = [
# development & testing tools
"dbx>=0.8,<0.9"
]
setup(
name="cicd_sample_project",
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["setuptools","wheel"],
install_requires=PACKAGE_REQUIREMENTS,
extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
entry_points = {
"console_scripts": [
]},
version=__version__,
description="",
author="",
)
Your Environment
-
dbx version used:
cicd-sample-project git:(main) ✗ dbx --version [dbx][2022-11-17 13:35:01.971] 🧱 Databricks eXtensions aka dbx, version ~> 0.8.7 -
Databricks Runtime version:
- 11.3 LTS
- 10.4 LTS
-
python venv:
(.venv) ➜ cicd-sample-project git:(main) ✗ pip freeze aiohttp==3.8.3 aiosignal==1.3.1 arrow==1.2.3 async-timeout==4.0.2 attrs==22.1.0 binaryornot==0.4.4 certifi==2022.9.24 cffi==1.15.1 chardet==5.0.0 charset-normalizer==2.1.1 # Editable Git install with no remote (cicd-sample-project==0.0.1) -e REDACTED/cicd-sample-project click==8.1.3 cloudpickle==2.2.0 colorama==0.4.6 commonmark==0.9.1 cookiecutter==2.1.1 cryptography==38.0.3 databricks-cli==0.17.3 dbx==0.8.7 decorator==5.1.1 delta-spark==1.1.0 entrypoints==0.4 frozenlist==1.3.3 gitdb==4.0.9 GitPython==3.1.29 idna==3.4 importlib-metadata==5.0.0 Jinja2==3.1.2 jinja2-time==0.2.0 MarkupSafe==2.1.1 mlflow-skinny==2.0.0 multidict==6.0.2 oauthlib==3.2.2 packaging==21.3 pathspec==0.10.2 protobuf==4.21.9 py==1.11.0 py4j==0.10.9.3 pycparser==2.21 pydantic==1.10.2 Pygments==2.13.0 PyJWT==2.6.0 pyparsing==3.0.9 pyspark==3.2.1 python-dateutil==2.8.2 python-slugify==6.1.2 pytz==2022.6 PyYAML==6.0 requests==2.28.1 retry==0.9.2 rich==12.6.0 shellingham==1.5.0 six==1.16.0 smmap==5.0.0 sqlparse==0.4.3 tabulate==0.9.0 text-unidecode==1.3 typer==0.7.0 typing_extensions==4.4.0 urllib3==1.26.12 watchdog==2.1.9 yarl==1.8.1 zipp==3.10.0 -
Local OS info:
MacOS 12.6.1 (21G217)
hi @Squaess , thanks a lot for opening the issue. I'll try to repo and see what causes it.
Same thing happens for 11.0, 11.1 ML DBR