[Feature Request] [Workflow API] Enable users to import helper functions / utilities from a separate python script in FederatedRuntime Notebooks

Open refai06 opened this issue 7 months ago • 1 comments

Is your feature request related to a problem? Please describe.

The issue related to #1565, where the User encountered a problem while importing user-defined modules in Jupyter notebooks while running a FederatedRuntime experiment

In the current implementation of Workflow API, jupyter notebook is expected to define the Federated Learning experiment in it's entirety. If the user attempts to import a user defined module from a different python script it will fail due to following reasons:

When the notebook is exported, the script inside the generated_workspace does not contain the user defined code
This will lead to ModuleNotFoundError failure during execution on participants in a distributed infrastructure

Describe the solution you'd like

Enable users to import helper functions from a separate python script. For e.g. FL experiment tutorial: crowd_guard.ipynb is importing some helper functions / classes from user-defined script validation.py in a folder workspace

workspace
├── crowd_guard.ipynb 
└── validation.py

validation.py (contains helper class)

class CrowdGuardClientValidation:

    def __distance_global_model_final_metric(distance_type: str, prediction_matrix,
                                             prediction_global_model, sample_indices_by_label,
                                             own_index):

    def __predict_for_single_model(model, local_data, device):

    def __do_predictions(models, global_model, local_data, device):

    def __prune_poisoned_models(num_layers, total_number_of_clients, own_client_index,
                                distances_by_metric, verbose=False):

    def validate_models(global_model, models, own_client_index, local_data, device):

CrowdGuard.ipynb (Jupyter notebook for Workflow API experiment)

#| export

from validation import CrowdGuardClientValidation

class FederatedFlow_CrowdGuard(FLSpec):

    @aggregator
    def start(self):

    @collaborator
    def train(self):

    @collaborator
    def local_validation(self):
        ...
        detected_suspicious_models = CrowdGuardClientValidation.validate_models(self.global_model,
                                                                                all_models,
                                                                                own_client_index,
                                                                                self.train_loader,
                                                                                self.device)
	...
	

    @aggregator
    def end(self):

To support this use case, existing export process in notebook_tools needs to be enhanced to

Analyse all imports in Jupyter Notebook and identify user-defined imports
Copy user defined python scripts / folders containing these imports into the generated_workpace

This shall ensure that generated_workspace (shown below) includes all user-defined code and ensure that it works on the distributed infrastructure

workspace
├── generated_workspace
│	├── src
│	│   ├── __init__.py
│	│   ├── experiment.py
│	│   └── validation.py 
│	├── .workspace
│	├── plan
│	│   └── plan.yaml
│	└── requirements.txt
├── crowd_guard.ipynb 
└── validation.py

Describe alternatives you've considered

N.A.

Additional context

This enhancement shall be based on following Requirements & Guidelines:

Export Directives
- User-defined imports should be present in a notebook cell that is annotated by #| export directive as the first line
- Rationale:
  - #| export directives are required to export the user-defined imports to exported script and further processing
User-defined scripts should not install any packages:
- User-defined scripts should not install any package
- Rationale:
  - While the exported script is analyzed to identify dependencies and build the requirements.txt for the FL experiment, User-defined scripts are not analyzed by the infrastructure to identify dependencies
Location of User defined python scripts:
- User-defined modules must be placed in the same directory as the Jupyter Notebook to enable the infrastructure to correctly locate and copy these modules into the generate_workspace
- Rationale:
  - Custom Path Dependencies: A user-defined module located
```
/home/users
└── fl_helper
    ├── __init__.py
    └── validation.py
```
    CrowdGuard.ipynb (Jupyter notebook for Workflow API experiment)
```
...
sys.path.append('/home/user/fl_helpers')

from validation import CrowdGuardClientValidation
```
  - Importing from a custom path requires explicit modification of sys.path, which is not recommended and can lead to inconsistency across distributed system
  - Relying on custom paths or module locations outside the notebook directory, which adds complexity for the infrastructure in identifying and accessing the required user-defined modules
  - Placing the modules in the same directory as the Jupyter Notebook streamlines the process, simplifies access, and eliminates the need to modify sys.path Example:
```
workspace
├── crowd_guard.ipynb 
└── fl_helper
    ├── __init__.py
    └── validation.py
```

Restrictions on User-defined imports

User defined code should not modify the sys.path to enable python to find the scripts to import. For e.g.

workspace
├── crowd_guard.ipynb 
└── helper
     └── validation.py

CrowdGuard.ipynb (Jupyter notebook for Workflow API experiment)

...
...
sys.path.append('./helper')

from validation import CrowdGuardClientValidation
...

Recommended_Usage

from helper.validation import CrowdGuardClientValidation

User-defined imports should be self-contained:

User defined code should not import other user-defined code from different python scripts. For e.g.

utils.py (contains additional helper functions)

def calculate_accuracy(predictions, labels):
      correct = (predications == labels).sum()
      return correct / len(labels)

validation.py (contains helper functions)

from utils import calculate_accuracy

class CrowdGuardClientValidation:
      def validate_models(global_model, models, own_client_index, local_data, device):
      ...
      accuracy = calculate_accuracy(predictions, labels)

Jun 03 '25 12:06 refai06

I've run into the exact same issue and I also think that this functionality would be quiet handy. Especially in bigger projects.

Also I asked about this in another thread a few days a ago and the response was https://github.com/securefederatedai/openfl/issues/1565#issuecomment-2899848065 :

At present, importing user-defined modules from separate Python files is not supported and [...] We will [...] consider the possibility of supporting this functionality in a future release

Jun 03 '25 20:06 tayfunceylan