databricks-sdk-py icon indicating copy to clipboard operation
databricks-sdk-py copied to clipboard

SyntaxError: EOL while scanning string literal

Open ConstantinoSchillebeeckx opened this issue 2 years ago • 8 comments

Description

We intermittently see the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>

  File "/usr/lib/python3.9/pkgutil.py", line 108, in walk_packages

    yield from walk_packages(path, info.name+'.', onerror)
  File "/usr/lib/python3.9/pkgutil.py", line 108, in walk_packages

    yield from walk_packages(path, info.name+'.', onerror)

  File "/usr/lib/python3.9/pkgutil.py", line 93, in walk_packages

    __import__(info.name)
  File "/databricks/python3/lib/python3.9/site-packages/dbt/adapters/databricks/__init__.py", line 1, in <module>

    from dbt.adapters.databricks.connections import DatabricksConnectionManager  # noqa

  File "/databricks/python3/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 47, in <module>

    from databricks import sql as dbsql
  File "/databricks/python3/lib/python3.9/site-packages/databricks/__init__.py", line 3

    __path__ = __import__('pkgutil').extend_path(__path__, __name__)").extend_path(__path__, __name__)
                                                                                                      ^
SyntaxError: EOL while scanning string literal

We've seen this mostly during our CI where we build our Docker container, and as a sanity check, we import our code to ensure everything is kosher. Most recently, we saw this happen as we were running pytest.

Clearly the identified line seems to get modified as it differs from the referenced one of: https://github.com/databricks/databricks-sdk-py/blob/cef100c3a3c9dce91bc2bc8d6c59a93febd8f707/databricks/init.py#L3

Reproduction

Unfortunately, I'm struggling to reproduce this 😢

Expected behavior No SyntaxError

Debug Logs Can't generate since I can't reproduce.

Other Information

  • OS: Ubuntu 20.04.6
  • Version: 0.6.0

Thanks for reporting. It seems something in your setup is adding a double quote to the line producing an error.

Our version of __init__.py looks like this: https://github.com/databricks/databricks-sdk-py/blob/986d1d98d4fea66c99d0ea6ccfc64b9faa1115db/databricks/init.py#L3

The error you mention includes a double quote in that line, thus produces the error "EOL while scanning string literal".

Closing because this is not an SDK issue.

pietern avatar Sep 13 '23 13:09 pietern

~~In case anyone else runs into this, running a poetry lock --no-update seems to have resolved the issue~~ We've intermittently kept seeing this; the only resolution is just to rerun CI.

I am running into this in CI as well when I run pytest. I only see it very intermittently, though. I've done several poetry lock --no-update over the past few months, but this issue still intermittently persists.

In python 3.10 the error looks like this:

    from databricks.sqlalchemy import dialect as db_types
E     File "/data/github/actions-runner-11/_work/redacted/.venv/lib/python3.10/site-packages/databricks/__init__.py", line 3
E       __path__ = __import__('pkgutil').extend_path(__path__, __name__)").extend_path(__path__, __name__)
E                                                                       ^
E   SyntaxError: unterminated string literal (detected at line 3)

travischambers avatar Dec 21 '23 00:12 travischambers

Thanks for chiming in, @travischambers.

I now notice how the imports in the traces are different. I see:

  • from databricks import sql as dbsql
  • from databricks.sqlalchemy import dialect as db_types

For multiple packages to all work with the databricks namespace, we need that line.

But perhaps one of these packages use slightly different contents, and when the time of installation lines up, they clobber/partially overwrite each other's contents. I'm taking a look at the __init__.py contents of the other packages now.

pietern avatar Dec 21 '23 10:12 pietern

Alright, so it turns out that:

  • __init__.py in this repository is 260 bytes
  • __init__.py in https://github.com/databricks/databricks-sql-python is 295 bytes

The contents of the latter, after 260 bytes is:

").extend_path(__path__, __name__)

This is exactly the same as the line trailer we see in the error messages that turn it into a syntax error.

I suspect what happens is:

  1. Installer for databricks-sdk (SDK) opens the file
  2. Installer for databricks-sql-python (SQL) opens the file
  3. SQL writes its contents (295 bytes)
  4. SQL closes the file
  5. SDK writes its contents (260 bytes) without truncating because the file was already open (and truncated)
  6. SDK closes the file

This only happens if the timing of the (supposedly) parallel installer lines up perfectly, explaining why it is intermittent.

I looked up the Poetry configuration and indeed it uses parallel installation by default. You can disable installer parallelism with a setting (see docs). Could you check that this solves the issue?

We can update the contents of these files to match to hide the issue, but that doesn't solve it for existing versions.

pietern avatar Dec 21 '23 10:12 pietern

Damn, that's some next level debugging! Thanks for investigating.

We could turn off the parallel install but this happens so infrequently (perhaps a handful a month), it could take a while to get some feedback. And even then, it's not clear whether the updated setting did actually resolve anything or whether this intermittent issue hasn't happened yet.

The file content update seems reasonable, and we can just wait for that update.

Thanks for getting back! Yeah, as long as the SDK and SQL connector packages are being updated, and these fixes are merged and released, the issue should just stop happening at some point.

pietern avatar Dec 21 '23 16:12 pietern

Thanks for the quick debugging! We have ~10 different github actions workflows and we run into this daily. I have set poetry config installer.parallel false and will monitor to see if we run into this failure again.

For our install, with 422 packages, turning off parallel installer wasn't a very large perf hit either.

Avg install time with installer.parallel true: 80s Avg install time with installer.parallel false: 96s

travischambers avatar Dec 21 '23 16:12 travischambers