dbx icon indicating copy to clipboard operation
dbx copied to clipboard

Error logs on Databricks Runtime 11.3 LTS do not display correctly.

Open Squaess opened this issue 2 years ago • 2 comments

Expected Behavior

When looking at the Databricsk UI the error message with the stack trace is displayed in a clear way.

Current Behavior

When running the job on the Databricks 11.3 LTS Runtime the error message in the UI contains ANSI escape characters:

== SQL ==
this table doesn't exist
-----^^^
[0;31m---------------------------------------------------------------------------[0m
[0;31mParseException[0m                            Traceback (most recent call last)
[0;32m<command--1>[0m in [0;36m<cell line: 13>[0;34m()[0m
[1;32m     12[0m [0;34m[0m[0m
[1;32m     13[0m [0;32mwith[0m [0mopen[0m[0;34m([0m[0mfilename[0m[0;34m,[0m [0;34m"rb"[0m[0;34m)[0m [0;32mas[0m [0mf[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 14[0;31m   [0mexec[0m[0;34m([0m[0mcompile[0m[0;34m([0m[0mf[0m[0;34m.[0m[0mread[0m[0;34m([0m[0;34m)[0m[0;34m,[0m [0mfilename[0m[0;34m,[0m [0;34m'exec'[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m     15[0m [0;34m[0m[0m

[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36m<module>[0;34m[0m
[1;32m      7[0m [0;34m[0m[0m
[1;32m      8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 9[0;31m     [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36mentrypoint[0;34m()[0m
[1;32m      4[0m [0;32mdef[0m [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m      5[0m     [0mspark[0m [0;34m=[0m [0mSparkSession[0m[0;34m.[0m[0mbuilder[0m[0;34m.[0m[0mgetOrCreate[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 6[0;31m     [0mspark[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0;34m"this table doesn't exist"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m      7[0m [0;34m[0m[0m
[1;32m      8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m

[0;32m/databricks/spark/python/pyspark/instrumentation_utils.py[0m in [0;36mwrapper[0;34m(*args, **kwargs)[0m
[1;32m     46[0m             [0mstart[0m [0;34m=[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m     47[0m             [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 48[0;31m                 [0mres[0m [0;34m=[0m [0mfunc[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m     49[0m                 logger.log_success(
[1;32m     50[0m                     [0mmodule_name[0m[0;34m,[0m [0mclass_name[0m[0;34m,[0m [0mfunction_name[0m[0;34m,[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m [0;34m-[0m [0mstart[0m[0;34m,[0m [0msignature[0m[0;34m[0m[0;34m[0m[0m

[0;32m/databricks/spark/python/pyspark/sql/session.py[0m in [0;36mtable[0;34m(self, tableName)[0m
[1;32m   1138[0m         [0;32mTrue[0m[0;34m[0m[0;34m[0m[0m
[1;32m   1139[0m         """
[0;32m-> 1140[0;31m         [0;32mreturn[0m [0mDataFrame[0m[0;34m([0m[0mself[0m[0;34m.[0m[0m_jsparkSession[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0mtableName[0m[0;34m)[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m   1141[0m [0;34m[0m[0m
[1;32m   1142[0m     [0;34m@[0m[0mproperty[0m[0;34m[0m[0;34m[0m[0m

[0;32m/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py[0m in [0;36m__call__[0;34m(self, *args)[0m
[1;32m   1319[0m [0;34m[0m[0m
[1;32m   1320[0m         [0manswer[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mgateway_client[0m[0;34m.[0m[0msend_command[0m[0;34m([0m[0mcommand[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m-> 1321[0;31m         return_value = get_return_value(
[0m[1;32m   1322[0m             answer, self.gateway_client, self.target_id, self.name)
[1;32m   1323[0m [0;34m[0m[0m

[0;32m/databricks/spark/python/pyspark/sql/utils.py[0m in [0;36mdeco[0;34m(*a, **kw)[0m
[1;32m    200[0m                 [0;31m# Hide where the exception came from that shows a non-Pythonic[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m    201[0m                 [0;31m# JVM exception message.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 202[0;31m                 [0;32mraise[0m [0mconverted[0m [0;32mfrom[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m    203[0m             [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    204[0m                 [0;32mraise[0m[0;34m[0m[0;34m[0m[0m

[0;31mParseException[0m: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'table'(line 1, pos 5)

== SQL ==
this table doesn't exist
-----^^^

Steps to Reproduce (for bugs)

  1. Create script which fails and configure it to run on 2 different Databricks Runtime versions: 11.3 LTS and 10.4 LTS.
  2. Run the dbx deploy command.
  3. Execute the workflow using the UI.
  4. Observe how the error message is displayed.

Context

I've noticed that there is a problem when showing the error message in Databricks Runtime 11.3 LTS. In order to verify this here is an example setup:

Parts of the deployment file:

custom:
  cluster-11-3: &cluster-11-3
    new_cluster:
      spark_version: "11.3.x-scala2.12"
      num_workers: 1
      node_type_id: "i3.xlarge"
      aws_attributes:
        ...[REDACTED]...

  cluster-10-4: &cluster-10-4
    new_cluster:
      spark_version: "10.4.x-scala2.12"
      num_workers: 1
      node_type_id: "i3.xlarge"
      aws_attributes:
        ...[REDACTED]...

build:
  no_build: true

environments:
  default:
    workflows:
      - name: "run-python-task"
        tasks:
          - task_key: "run-11-3"
            <<: *cluster-11-3
            spark_python_task:
              python_file: "file://cicd_sample_project/main.py"
              parameters: []
          - task_key: "run-10-4"
            <<: *cluster-10-4
            spark_python_task:
              python_file: "file://cicd_sample_project/main.py"
              parameters: []

Content of the cicd_sample_project/main.py file:

from pyspark.sql import SparkSession


def entrypoint():
    spark = SparkSession.builder.getOrCreate()
    spark.table("this table doesn't exist")

if __name__ == "__main__":
    entrypoint()

setup.py file:

"""
This file configures the Python package with entrypoints used for future runs on Databricks.

Please follow the `entry_points` documentation for more details on how to configure the entrypoint:
* https://setuptools.pypa.io/en/latest/userguide/entry_point.html
"""

from setuptools import find_packages, setup
from cicd_sample_project import __version__

PACKAGE_REQUIREMENTS = ["pyyaml"]

# packages for local development and unit testing
# please note that these packages are already available in DBR, there is no need to install them on DBR.
LOCAL_REQUIREMENTS = [
    "pyspark==3.2.1",
    "delta-spark==1.1.0",
]

TEST_REQUIREMENTS = [
    # development & testing tools
    "dbx>=0.8,<0.9"
]

setup(
    name="cicd_sample_project",
    packages=find_packages(exclude=["tests", "tests.*"]),
    setup_requires=["setuptools","wheel"],
    install_requires=PACKAGE_REQUIREMENTS,
    extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
    entry_points = {
        "console_scripts": [
    ]},
    version=__version__,
    description="",
    author="",
)

Your Environment

  • dbx version used:

    cicd-sample-project git:(main) ✗ dbx --version
    [dbx][2022-11-17 13:35:01.971] 🧱 Databricks eXtensions aka dbx, version ~> 0.8.7
    
  • Databricks Runtime version:

    • 11.3 LTS
    • 10.4 LTS
  • python venv:

    (.venv) ➜  cicd-sample-project git:(main) ✗ pip freeze
    aiohttp==3.8.3
    aiosignal==1.3.1
    arrow==1.2.3
    async-timeout==4.0.2
    attrs==22.1.0
    binaryornot==0.4.4
    certifi==2022.9.24
    cffi==1.15.1
    chardet==5.0.0
    charset-normalizer==2.1.1
    # Editable Git install with no remote (cicd-sample-project==0.0.1)
    -e REDACTED/cicd-sample-project
    click==8.1.3
    cloudpickle==2.2.0
    colorama==0.4.6
    commonmark==0.9.1
    cookiecutter==2.1.1
    cryptography==38.0.3
    databricks-cli==0.17.3
    dbx==0.8.7
    decorator==5.1.1
    delta-spark==1.1.0
    entrypoints==0.4
    frozenlist==1.3.3
    gitdb==4.0.9
    GitPython==3.1.29
    idna==3.4
    importlib-metadata==5.0.0
    Jinja2==3.1.2
    jinja2-time==0.2.0
    MarkupSafe==2.1.1
    mlflow-skinny==2.0.0
    multidict==6.0.2
    oauthlib==3.2.2
    packaging==21.3
    pathspec==0.10.2
    protobuf==4.21.9
    py==1.11.0
    py4j==0.10.9.3
    pycparser==2.21
    pydantic==1.10.2
    Pygments==2.13.0
    PyJWT==2.6.0
    pyparsing==3.0.9
    pyspark==3.2.1
    python-dateutil==2.8.2
    python-slugify==6.1.2
    pytz==2022.6
    PyYAML==6.0
    requests==2.28.1
    retry==0.9.2
    rich==12.6.0
    shellingham==1.5.0
    six==1.16.0
    smmap==5.0.0
    sqlparse==0.4.3
    tabulate==0.9.0
    text-unidecode==1.3
    typer==0.7.0
    typing_extensions==4.4.0
    urllib3==1.26.12
    watchdog==2.1.9
    yarl==1.8.1
    zipp==3.10.0
    
  • Local OS info: MacOS 12.6.1 (21G217)

Squaess avatar Nov 17 '22 12:11 Squaess

hi @Squaess , thanks a lot for opening the issue. I'll try to repo and see what causes it.

renardeinside avatar Nov 17 '22 13:11 renardeinside

Same thing happens for 11.0, 11.1 ML DBR

goldstein0101 avatar Nov 20 '22 13:11 goldstein0101