dbx Cannot dbx execute code from tutorial

Cannot dbx execute code from tutorial

Open CarlaFernandez opened this issue 3 years ago • 1 comments
Expected Behavior

Be able to execute the covid_analysis_etl_integ workflow from this tutorial in an existing cluster in interactive mode.
Current Behavior

I get the following stack trace:
[dbx][2022-09-27 08:53:35.009] 🔎 Deployment file is not provided, searching in the conf directory
[dbx][2022-09-27 08:53:35.014] 💡 Auto-discovery found deployment file conf\deployment.yml                                                             
[dbx][2022-09-27 08:53:35.015] 🆗 Deployment file conf\deployment.yml exists and will be used for deployment                                           
[dbx][2022-09-27 08:53:35.018] Legacy environment format is used in project file. Please take a look at the docs and upgrade to the new format version.
[dbx][2022-09-27 08:53:35.021] Using profile provided from the project file                                                                            
[dbx][2022-09-27 08:53:35.023] Found auth config from provider ProfileEnvConfigProvider, verifying it                                                  
[dbx][2022-09-27 08:53:35.024] Found auth config from provider ProfileEnvConfigProvider, verification successful                                       
[dbx][2022-09-27 08:53:35.026] Profile DEFAULT will be used for deployment                                                                             
[dbx][2022-09-27 08:53:36.502] Executing job: covid_analysis_etl_integ in environment default on cluster my_cluster_name (id: correctly-identified-id)
[dbx][2022-09-27 08:53:36.508] No build logic defined in the deployment file. Default pip-based build logic will be used.
[dbx][2022-09-27 08:53:36.509] Following the provided build logic                                                        
[dbx][2022-09-27 08:53:36.511] 🐍 Building a Python-based project                                                        
[dbx][2022-09-27 08:53:36.513] 🧹 Standard package folder  dist already exists, cleaning it before Python package build   
[dbx][2022-09-27 08:53:39.191] ✅ Python-based project build finished
[dbx][2022-09-27 08:53:39.193] Preparing interactive cluster to accept jobs
[dbx][2022-09-27 08:53:39.295] Preparing execution context
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\carlafernandez\miniconda3\envs\prueba_dbx\lib\site-packages\dbx\commands\execute.py:134 │
│ in execute                                                                                       │
│                                                                                                  │
│   131 │   dbx_echo("Preparing interactive cluster to accept jobs")                               │
│   132 │   controller.awake_cluster(cluster_id)                                                   │
│   133 │                                                                                          │
│ ❱ 134 │   context_client = RichExecutionContextClient(api_client, cluster_id)                    │
│   135 │                                                                                          │
│   136 │   controller_instance = ExecutionController(                                             │
│   137 │   │   client=context_client,                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                 _job = 'covid_analysis_etl_integ'                                            │ │
│ │             _payload = {                                                                     │ │
│ │                        │   'name': 'covid_analysis_etl_integ',                               │ │
│ │                        │   'spark_python_task': {                                            │ │
│ │                        │   │   'python_file': 'file://jobs/covid_trends_job.py'              │ │
│ │                        │   }                                                                 │ │
│ │                        }                                                                     │ │
│ │           api_client = <databricks_cli.sdk.api_client.ApiClient object at                    │ │
│ │                        0x0000024E95F70AC0>                                                   │ │
│ │           cluster_id = '0606-205319-pcu6ddt9'                                                │ │
│ │         cluster_name = 'paa-sls-datalab-proveedoresanalitica'                                │ │
│ │               config = DeploymentConfig(                                                     │ │
│ │                        │   environments=[                                                    │ │
│ │                        │   │   EnvironmentDeploymentInfo(                                    │ │
│ │                        │   │   │   name='default',                                           │ │
│ │                        │   │   │   payload=Deployment(                                       │ │
│ │                        │   │   │   │   workflows=[                                           │ │
│ │                        │   │   │   │   │   {                                                 │ │
│ │                        │   │   │   │   │   │   'name': 'covid_analysis_etl_integ',           │ │
│ │                        │   │   │   │   │   │   'spark_python_task': {                        │ │
│ │                        │   │   │   │   │   │   │   'python_file':                            │ │
│ │                        'file://jobs/covid_trends_job.py'                                     │ │
│ │                        │   │   │   │   │   │   }                                             │ │
│ │                        │   │   │   │   │   },                                                │ │
│ │                        │   │   │   │   │   {                                                 │ │
│ │                        │   │   │   │   │   │   'name': 'covid_analysis_etl_prod',            │ │
│ │                        │   │   │   │   │   │   'new_cluster': {                              │ │
│ │                        │   │   │   │   │   │   │   'spark_version':                          │ │
│ │                        '9.1.x-cpu-ml-scala2.12',                                             │ │
│ │                        │   │   │   │   │   │   │   'num_workers': 1,                         │ │
│ │                        │   │   │   │   │   │   │   'node_type_id': 'Standard_DS4_v2'         │ │
│ │                        │   │   │   │   │   │   },                                            │ │
│ │                        │   │   │   │   │   │   'spark_python_task': {                        │ │
│ │                        │   │   │   │   │   │   │   'python_file':                            │ │
│ │                        'file://jobs/covid_trends_job.py',                                    │ │
│ │                        │   │   │   │   │   │   │   'parameters': ['--prod']                  │ │
│ │                        │   │   │   │   │   │   }                                             │ │
│ │                        │   │   │   │   │   },                                                │ │
│ │                        │   │   │   │   │   {                                                 │ │
│ │                        │   │   │   │   │   │   'name': 'covid_analysis_etl_raw',             │ │
│ │                        │   │   │   │   │   │   'new_cluster': {                              │ │
│ │                        │   │   │   │   │   │   │   'spark_version':                          │ │
│ │                        '9.1.x-cpu-ml-scala2.12',                                             │ │
│ │                        │   │   │   │   │   │   │   'num_workers': 1,                         │ │
│ │                        │   │   │   │   │   │   │   'node_type_id': 'Standard_DS4_v2'         │ │
│ │                        │   │   │   │   │   │   },                                            │ │
│ │                        │   │   │   │   │   │   'spark_python_task': {                        │ │
│ │                        │   │   │   │   │   │   │   'python_file':                            │ │
│ │                        'file://jobs/covid_trends_job_raw.py'                                 │ │
│ │                        │   │   │   │   │   │   }                                             │ │
│ │                        │   │   │   │   │   }                                                 │ │
│ │                        │   │   │   │   ]                                                     │ │
│ │                        │   │   │   )                                                         │ │
│ │                        │   │   )                                                             │ │
│ │                        │   ],                                                                │ │
│ │                        │   build=BuildConfiguration(                                         │ │
│ │                        │   │   no_build=False,                                               │ │
│ │                        │   │   commands=[],                                                  │ │
│ │                        │   │   python=<PythonBuild.pip: 'pip'>                               │ │
│ │                        │   )                                                                 │ │
│ │                        )                                                                     │ │
│ │        config_reader = <dbx.api.config_reader.ConfigReader object at 0x0000024E9625AD70>     │ │
│ │           controller = <dbx.api.cluster.ClusterController object at 0x0000024E95F31CF0>      │ │
│ │                debug = None                                                                  │ │
│ │           deployment = EnvironmentDeploymentInfo(                                            │ │
│ │                        │   name='default',                                                   │ │
│ │                        │   payload=Deployment(                                               │ │
│ │                        │   │   workflows=[                                                   │ │
│ │                        │   │   │   {                                                         │ │
│ │                        │   │   │   │   'name': 'covid_analysis_etl_integ',                   │ │
│ │                        │   │   │   │   'spark_python_task': {                                │ │
│ │                        │   │   │   │   │   'python_file': 'file://jobs/covid_trends_job.py'  │ │
│ │                        │   │   │   │   }                                                     │ │
│ │                        │   │   │   },                                                        │ │
│ │                        │   │   │   {                                                         │ │
│ │                        │   │   │   │   'name': 'covid_analysis_etl_prod',                    │ │
│ │                        │   │   │   │   'new_cluster': {                                      │ │
│ │                        │   │   │   │   │   'spark_version': '9.1.x-cpu-ml-scala2.12',        │ │
│ │                        │   │   │   │   │   'num_workers': 1,                                 │ │
│ │                        │   │   │   │   │   'node_type_id': 'Standard_DS4_v2'                 │ │
│ │                        │   │   │   │   },                                                    │ │
│ │                        │   │   │   │   'spark_python_task': {                                │ │
│ │                        │   │   │   │   │   'python_file': 'file://jobs/covid_trends_job.py', │ │
│ │                        │   │   │   │   │   'parameters': ['--prod']                          │ │
│ │                        │   │   │   │   }                                                     │ │
│ │                        │   │   │   },                                                        │ │
│ │                        │   │   │   {                                                         │ │
│ │                        │   │   │   │   'name': 'covid_analysis_etl_raw',                     │ │
│ │                        │   │   │   │   'new_cluster': {                                      │ │
│ │                        │   │   │   │   │   'spark_version': '9.1.x-cpu-ml-scala2.12',        │ │
│ │                        │   │   │   │   │   'num_workers': 1,                                 │ │
│ │                        │   │   │   │   │   'node_type_id': 'Standard_DS4_v2'                 │ │
│ │                        │   │   │   │   },                                                    │ │
│ │                        │   │   │   │   'spark_python_task': {                                │ │
│ │                        │   │   │   │   │   'python_file':                                    │ │
│ │                        'file://jobs/covid_trends_job_raw.py'                                 │ │
│ │                        │   │   │   │   }                                                     │ │
│ │                        │   │   │   }                                                         │ │
│ │                        │   │   ]                                                             │ │
│ │                        │   )                                                                 │ │
│ │                        )                                                                     │ │
│ │      deployment_file = WindowsPath('conf/deployment.yml')                                    │ │
│ │          environment = 'default'                                                             │ │
│ │           found_jobs = [                                                                     │ │
│ │                        │   {                                                                 │ │
│ │                        │   │   'name': 'covid_analysis_etl_integ',                           │ │
│ │                        │   │   'spark_python_task': {                                        │ │
│ │                        │   │   │   'python_file': 'file://jobs/covid_trends_job.py'          │ │
│ │                        │   │   }                                                             │ │
│ │                        │   }                                                                 │ │
│ │                        ]                                                                     │ │
│ │ jinja_variables_file = None                                                                  │ │
│ │                  job = None                                                                  │ │
│ │          job_payload = {                                                                     │ │
│ │                        │   'name': 'covid_analysis_etl_integ',                               │ │
│ │                        │   'spark_python_task': {                                            │ │
│ │                        │   │   'python_file': 'file://jobs/covid_trends_job.py'              │ │
│ │                        │   }                                                                 │ │
│ │                        }                                                                     │ │
│ │           no_package = False                                                                 │ │
│ │           no_rebuild = False                                                                 │ │
│ │           parameters = None                                                                  │ │
│ │   pip_install_extras = None                                                                  │ │
│ │    requirements_file = WindowsPath('requirements.txt')                                       │ │
│ │                 task = Task(                                                                 │ │
│ │                        │   spark_python_task=SparkPythonTask(                                │ │
│ │                        │   │   python_file=WindowsPath('jobs/covid_trends_job.py'),          │ │
│ │                        │   │   parameters=[]                                                 │ │
│ │                        │   ),                                                                │ │
│ │                        │   python_wheel_task=None,                                           │ │
│ │                        │   task_type=<TaskType.spark_python_task: 'spark_python_task'>       │ │
│ │                        )                                                                     │ │
│ │   upload_via_context = False                                                                 │ │
│ │             workflow = 'covid_analysis_etl_integ'                                            │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\carlafernandez\miniconda3\envs\prueba_dbx\lib\site-packages\dbx\api\context.py:121 in   │
│ __init__                                                                                         │
│                                                                                                  │
│   118                                                                                            │
│   119 class RichExecutionContextClient:                                                          │
│   120 │   def __init__(self, v2_client: ApiClient, cluster_id: str, language: str = "python")    │
│ ❱ 121 │   │   self._client = LowLevelExecutionContextClient(v2_client, cluster_id, language)     │
│   122 │                                                                                          │
│   123 │   def install_package(self, package_file: str, pip_install_extras: Optional[str]):       │
│   124 │   │   if pip_install_extras:                                                             │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ────────────────────────────────────────╮       │
│ │ cluster_id = '0606-205319-pcu6ddt9'                                                    │       │
│ │   language = 'python'                                                                  │       │
│ │       self = <dbx.api.context.RichExecutionContextClient object at 0x0000024E9625B7C0> │       │
│ │  v2_client = <databricks_cli.sdk.api_client.ApiClient object at 0x0000024E95F70AC0>    │       │
│ ╰────────────────────────────────────────────────────────────────────────────────────────╯       │
│                                                                                                  │
│ C:\Users\carlafernandez\miniconda3\envs\prueba_dbx\lib\site-packages\dbx\api\context.py:34 in    │
│ __init__                                                                                         │
│                                                                                                  │
│    31 │   def __init__(self, v2_client: ApiClient, cluster_id: str, language: str = "python"):   │
│    32 │   │   self._v1_client = ApiV1Client(v2_client)                                           │
│    33 │   │   self._cluster_id = cluster_id                                                      │
│ ❱  34 │   │   self._context_id = self.__get_context_id(language)                                 │
│    35 │                                                                                          │
│    36 │   def _wait_for_command_execution(self, command_id: str):                                │
│    37 │   │   finished = False                                                                   │
│                                                                                                  │
│ ╭────────────────────────────────────────── locals ──────────────────────────────────────────╮   │
│ │ cluster_id = '0606-205319-pcu6ddt9'                                                        │   │
│ │   language = 'python'                                                                      │   │
│ │       self = <dbx.api.context.LowLevelExecutionContextClient object at 0x0000024E9625A890> │   │
│ │  v2_client = <databricks_cli.sdk.api_client.ApiClient object at 0x0000024E95F70AC0>        │   │
│ ╰────────────────────────────────────────────────────────────────────────────────────────────╯   │
│                                                                                                  │
│ C:\Users\carlafernandez\miniconda3\envs\prueba_dbx\lib\site-packages\dbx\api\context.py:96 in    │
│ __get_context_id                                                                                 │
│                                                                                                  │
│    93 │                                                                                          │
│    94 │   def __get_context_id(self, language: str) -> str:                                      │
│    95 │   │   dbx_echo("Preparing execution context")                                            │
│ ❱  96 │   │   ctx = LocalContextManager.get_context()                                            │
│    97 │   │                                                                                      │
│    98 │   │   if ctx and self.__is_context_available(ctx.context_id):                            │
│    99 │   │   │   dbx_echo("Existing context is active, using it")                               │
│                                                                                                  │
│ ╭───────────────────────────────────────── locals ─────────────────────────────────────────╮     │
│ │ language = 'python'                                                                      │     │
│ │     self = <dbx.api.context.LowLevelExecutionContextClient object at 0x0000024E9625A890> │     │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────╯     │
│                                                                                                  │
│ C:\Users\carlafernandez\miniconda3\envs\prueba_dbx\lib\site-packages\dbx\api\context.py:25 in    │
│ get_context                                                                                      │
│                                                                                                  │
│    22 │   @classmethod                                                                           │
│    23 │   def get_context(cls) -> Optional[ContextInfo]:                                         │
│    24 │   │   if cls.context_file_path.exists():                                                 │
│ ❱  25 │   │   │   return ContextInfo(**JsonUtils.read(cls.context_file_path))                    │
│    26 │   │   else:                                                                              │
│    27 │   │   │   return None                                                                    │
│    28                                                                                            │
│                                                                                                  │
│ ╭────────────────────── locals ───────────────────────╮                                          │
│ │ cls = <class 'dbx.api.context.LocalContextManager'> │                                          │
│ ╰─────────────────────────────────────────────────────╯                                          │
│                                                                                                  │
│ C:\Users\carlafernandez\Documents\Projects\my_projects\ide-best-practices\pydantic\main.py:342   │
│ in pydantic.main.BaseModel.__init__                                                              │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ 'C:\\Users\\carlafernandez\\Documents\\Projects\\my_projects\\ide-best-practices\\pydantic\\main │
│ .py'                                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 1 validation error for ContextInfo
context_id
  field required (type=value_error.missing)
Steps to Reproduce (for bugs)

Follow the steps in this tutorial to set up the environment.
Comment lines 6-9 from deployment.yml.
Execute dbx execute --cluster-name <cluster-name> covid_analysis_etl_integ using an existing cluster of yours.
Context

N/A
Your Environment

dbx version used: 0.7.5
Databricks Runtime version: 9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12)
OS: Windows 10 Pro. Version 10.0.19044.
Virtual environment manager: Anaconda (miniconda)
Sep 27 '22 06:09 CarlaFernandez
hi @CarlaFernandez could you please check with the latest version of dbx?
Nov 06 '22 13:11 renardeinside
dbx dbx copied to clipboard

Cannot dbx execute code from tutorial

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

dbx
dbx copied to clipboard