databricks-vscode
databricks-vscode copied to clipboard
[BUG] - ModuleNotFoundError when calling a function that uses a User Defined Function (UDF)
System information
- Runtime: Databricks-VSCode (Databricks Runtime 13.3.x Scala 2.12)
- PySpark version: 3.4.2
- Python version: 3.10.1
- Operating system: Windows 10 Build 19045
Code structure
repo/
├── helper/
│ ├── __init__.py
│ ├── helper_module.py
│ └── ...
├── notebooks/
│ ├── notebook.ipynb
│ └── ...
└── pyproject.toml
Code sample
# helper_module.py
# From the Python Standard Library
import struct
# From PySpark
import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.sql import DataFrame
def str_hex_to_numeric(
hex_value: str,
data_type_name: str
) -> float:
"""Convert a hex string to a numeric value."""
if data_type_name == "Float":
return struct.unpack('!f', bytes.fromhex(hex_value))[0]
raise ValueError(f"Unknown data type: {data_type_name}")
def value_col_hex_to_numeric(
df: DataFrame,
value_col: str = "VALUE",
data_type_name_col: str = "DATA_TYPE_NAME"
) -> DataFrame:
"""Convert a hex string to a numeric value."""
return df.withColumn(
value_col,
F.udf(
str_hex_to_numeric, T.FloatType()
)(F.col(value_col), F.col(data_type_name_col))
)
# notebook.ipynb
# Navigate to the repo root directory and install the helper module
%pip install -e .
# Import the helper module
from helper import helper_module
# Create a Spark DataFrame
df = spark.createDataFrame([("1", "Float", "3f800000"), ("2", "Float", "40000000"),
("3", "Float", "40400000"), ("4", "Float", "40800000")],
["INDEX", "DATA_TYPE_NAME", "VALUE"])
# Convert the hex string to a numeric value
df = helper_module.value_col_hex_to_numeric(df)
# Display the DataFrame
df.show()
# -- Databricks Connect returns the following error --
# ModuleNotFoundError: No module named 'helper'
#
# -- While Azure Databricks returns the expected output --
# +-----+--------------+----------+
# |INDEX|DATA_TYPE_NAME| VALUE|
# +-----+--------------+----------+
# | 1| Float| 1.0|
# | 2| Float| 2.0|
# | 3| Float| 3.0|
# | 4| Float| 4.0|
# +-----+--------------+----------+