kedro icon indicating copy to clipboard operation
kedro copied to clipboard

UnboundLocalError: cannot access local variable 'pipelines_package' where it is not associated with a value

Open JenspederM opened this issue 9 months ago • 8 comments

Description

Error is thrown when trying to print find_pipelines from the kedro.framework.project module.

Context

Unable to use find_pipelines

Steps to Reproduce

  1. Add print(find_pipelines()) to the bottom of the pipeline_regitry.py file
  2. Run the file python ./src/<project>/pipeline_regitry.py

Expected Result

A dict of pipelines.

Actual Result

I get the following error:

[05/02/24 18:05:49] WARNING  /Users/.../.venv/lib/python3.12/site-pac warnings.py:110
                             kages/kedro/framework/project/__init__.py:350: UserWarning: An error                      
                             occurred while importing the 'None.pipeline' module. Nothing defined                      
                             therein will be returned by 'find_pipelines'.                                             
                                                                                                                       
                             Traceback (most recent call last):                                                        
                               File                                                                                    
                             "/Users/.../.venv/lib/python3.12/site-pa                
                             ckages/kedro/framework/project/__init__.py", line 347, in find_pipelines                  
                                 pipeline_module = importlib.import_module(pipeline_module_name)                       
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                       
                               File                                                                                    
                             "/Users/.../.rye/py/[email protected]/install/lib/python3.12/i                
                             mportlib/__init__.py", line 90, in import_module                                          
                                 return _bootstrap._gcd_import(name[level:], package, level)                           
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import                         
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load                      
                               File "<frozen importlib._bootstrap>", line 1310, in                                     
                             _find_and_load_unlocked                                                                   
                               File "<frozen importlib._bootstrap>", line 488, in                                      
                             _call_with_frames_removed                                                                 
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import                         
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load                      
                               File "<frozen importlib._bootstrap>", line 1324, in                                     
                             _find_and_load_unlocked                                                                   
                             ModuleNotFoundError: No module named 'None'                                               
                                                                                                                       
                               warnings.warn(                                                                          
                                                                                                                       
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/.../project/src/project/pipeline_registy.py:21 in <module>                                                                             │
│                                                                                                  │
│   18                                                                                             │
│   19                                                                                             │
│   20 if __name__ == "__main__":                                                                  │
│ ❱ 21 │   print(register_pipelines())                                                             │
│   22                                                                                             │
│                                                                                                  │
│ /Users/.../project/src/project/pipeline_registry.py:15 in register_pipelines                                                                   │
│                                                                                                  │
│   12 │   Returns:                                                                                │
│   13 │   │   A mapping from pipeline names to ``Pipeline`` objects.                              │
│   14 │   """                                                                                     │
│ ❱ 15 │   pipelines = find_pipelines()                                                            │
│   16 │   pipelines["__default__"] = sum(pipelines.values())                                      │
│   17 │   return pipelines                                                                        │
│   18                                                                                             │
│                                                                                                  │
│ /Users/.../.venv/lib/python3.12/site-packages/kedro/framework/project/__init__.py:367 in find_pipelines                                                        │
│                                                                                                  │
│   364 │   │   if str(exc) == f"No module named '{PACKAGE_NAME}.pipelines'":                      │
│   365 │   │   │   return pipelines_dict                                                          │
│   366 │                                                                                          │
│ ❱ 367 │   for pipeline_dir in pipelines_package.iterdir():                                       │
│   368 │   │   if not pipeline_dir.is_dir():                                                      │
│   369 │   │   │   continue                                                                       │
│   370                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'pipelines_package' where it is not associated with a value

Your Environment

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.19.5
  • Python version used (python -V): Python 3.12.2 using rye as package manager
  • Operating system and version: M1 Mac with macOS Sonoma Version 14.4.1

JenspederM avatar May 02 '24 16:05 JenspederM

Hi @JenspederM, thanks for flagging this issue. Can I ask what your use case is for printing the result of find_pipelines()?

This method has been added to enable auto discovery of pipelines and does some stuff in the back to make sure your project and its modules are discoverable (https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html). It's meant to run as part of a "regular" Kedro flow where it's preceded by certain project setup methods. You can fix your script by calling bootstrap_project() before find_pipelines() (https://docs.kedro.org/en/stable/kedro_project_setup/session.html#bootstrap-project-and-configure-project). However, I would only recommend doing that for exploration and not if you're planning to run that code in production.

Let me know if this makes sense!

merelcht avatar May 21 '24 10:05 merelcht

Hi @merelcht,

Thank you for your reply.

I am using find_pipelines() to generate databricks assets bundle resources. I am working on a template for asset bundles that uses Kedro for defining pipelines and dependencies and databricks workflows for scheduling. You can find the project here

Thanks for the suggesting bootstrap_project(). For now, I have been using configure_project(<package-name>) as used in databricks_run.py in the databricks-iris starter.

You can see my exact usage right here

JenspederM avatar May 21 '24 12:05 JenspederM

@merelcht

I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?

I made the template based on my own experience of running large scale Databricks projects in production with many contributors of varying levels of experience.

JenspederM avatar May 27 '24 13:05 JenspederM

I'd say, regardless of use case, raising an UnboundLocalError from internal code should not happen, but a more informative error instead.

I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?

Of course! When you get to do it, we can promote it on https://github.com/kedro-org/awesome-kedro

Also consider exploring https://github.com/copier-org/copier/, a modern alternative to cookiecutter

astrojuanlu avatar May 29 '24 09:05 astrojuanlu

The only problem that I haven't really found a solution for is how I would get the workspace host from the users' Databricks config without using the Databricks CLI.

JenspederM avatar Jun 03 '24 11:06 JenspederM

I'd say, regardless of use case, raising an UnboundLocalError from internal code should not happen, but a more informative error instead.

@astrojuanlu I also looked into the UnboundLocalError, and I see that it could be resolved by adding asserts or running validate_settings() in find_pipelines() and ParallelRunner._run().

Or does it deserve a greater redesign?

IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.

JenspederM avatar Jun 03 '24 12:06 JenspederM

Moving this to our Inbox so that we can look at it and it doesn't get lost.

astrojuanlu avatar Jul 04 '24 09:07 astrojuanlu

IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.

For the record, I agree

astrojuanlu avatar Jul 04 '24 09:07 astrojuanlu