databricks-vscode icon indicating copy to clipboard operation
databricks-vscode copied to clipboard

[BUG] Verify databricks notebook init scripts report success in spite of error

Open cpbotha opened this issue 6 months ago • 4 comments

Describe the bug

I am able to run Python files on databricks via the vscode databricks extension, but notebook cells only run locally, and not on databricks, as is documented.

After some digging around in the logs, I finally found [TerminalIPythonApp] WARNING | Unknown error in handling startup files and message Notebook Init Script Error in the vscode Databricks Logs.

Frustratingly, the databricks extension command "verify databricks notebook init scripts" reports success! See screenshot below.

To Reproduce Run the command "Verify databricks notebook init scripts"

Screenshots

Image

System information:

  1. Paste the output ot the Help: About command (CMD-Shift-P).
  2. Databricks Extension Version
Version: 1.100.3 (Universal)
Commit: 258e40fedc6cb8edf399a463ce3a9d32e7e1f6f3
Date: 2025-06-02T13:30:54.273Z
Electron: 34.5.1
ElectronBuildId: 11369351
Chromium: 132.0.6834.210
Node.js: 20.19.0
V8: 13.2.152.41-electron.0
OS: Darwin arm64 24.5.0

Databricks extension version: 2.9.4

Databricks Extension Logs

databricks-cli-logs.json sdk-and-extension-logs.json

cpbotha avatar Jun 09 '25 10:06 cpbotha

Continuing the digging, I executed ipython in my Python environment, which revealed that the underlying problem was the missing msal package. Please see output below.

It would of course be MUCH better if the verify command could surface this traceback into the UI.

In addition, documentation should probably mention the msal dependency.

uv run ipython
Python 3.11.13 (main, Jun  3 2025, 18:38:25) [Clang 17.0.0 (clang-1700.0.13.3)]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.3.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: You can use LaTeX or Unicode completion, `\alpha<tab>` will insert the α symbol.
<module '__main__' from '/Users/charlbotha/.ipython/profile_default/startup/00-databricks-init-cdb6174794c74fbc909de1ce43bfe286.py'>
[TerminalIPythonApp] WARNING | Unknown error in handling startup files:
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File ~/.virtualenvs/databricks-azure-infra/lib/python3.11/site-packages/IPython/core/shellapp.py:404, in InteractiveShellApp._exec_file(self, fname, shell_futures)
    400                 self.shell.safe_execfile_ipy(full_filename,
    401                                              shell_futures=shell_futures)
    402             else:
    403                 # default to python, even without extension
--> 404                 self.shell.safe_execfile(full_filename,
    405                                          self.shell.user_ns,
    406                                          shell_futures=shell_futures,
    407                                          raise_exceptions=True)
    408 finally:
    409     sys.argv = save_argv

File ~/.virtualenvs/databricks-azure-infra/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2906, in InteractiveShell.safe_execfile(self, fname, exit_ignore, raise_exceptions, shell_futures, *where)
   2904 try:
   2905     glob, loc = (where + (None, ))[:2]
-> 2906     py3compat.execfile(
   2907         fname, glob, loc,
   2908         self.compile if shell_futures else None)
   2909 except SystemExit as status:
   2910     # If the call was made with 0 or None exit status (sys.exit(0)
   2911     # or sys.exit() ), don't bother showing a traceback, as both of
   (...)   2917     # For other exit status, we show the exception unless
   2918     # explicitly silenced, but only in short form.
   2919     if status.code:

File ~/.virtualenvs/databricks-azure-infra/lib/python3.11/site-packages/IPython/utils/py3compat.py:56, in execfile(fname, glob, loc, compiler)
     54 with open(fname, "rb") as f:
     55     compiler = compiler or compile
---> 56     exec(compiler(f.read(), fname, "exec"), glob, loc)

File ~/.ipython/profile_default/startup/session_management.py:3
      1 import json
      2 import textwrap
----> 3 import msal
      4 import requests
      5 import time

ModuleNotFoundError: No module named 'msal'

cpbotha avatar Jun 09 '25 11:06 cpbotha

After the msal dependency issue was addressed, I could not understand why the spark global setup by the 00-databricks-init script was being clobbered back to None.

Some debug print statements later led me to the start of the init_lighter.py startup script by the Fabric / Synapse extension (I had deleted this extension before I started with databricks because I ran into the warnings about it; however, deleting the extension did not remove its ipython customization).

Although that script only activates if a fabric kernel is detected, it has the following extremely inconsiderate code at the start, shown below, clobbering the spark and other globals.

One effective remedy is of course deleting init_lighter.py, but ideally that script does not start by blindly clobbering those globals!

sc = None
spark = None
sqlContext = None
spark_session_id = ''
create_spark_exception = None
script_version = '1.0.6'

cpbotha avatar Jun 09 '25 12:06 cpbotha

Tanks for the report!

msal itself seems to be from another startup script installed by the Fabric extension.

What we should probably do on our side:

  • Detect Fabric scripts in the jupyter startup folder and show an error notification to users, with an option to delete the scripts
  • Fix "verify" command (it doesn't check anything if you currently doesn't have a notebook editor active)

ilia-db avatar Jun 10 '25 08:06 ilia-db

Doh! You're completely right, msal was due to session_management.py which is also part of the fabric setup, which I did not realise at that point.

Indeed, having the verify command surface this sort of error to the user would be super helpful.

cpbotha avatar Jun 10 '25 11:06 cpbotha