Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working.
Describe the bug Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working.
To Reproduce Run the tutorial openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb
Expected behavior Tutorial should run successfully without any error.
Screenshots
Creating AGGREGATOR certificate key pair with following settings: CN=ktalwarx-mobl.gar.corp.intel.com, SAN=DNS:ktalwarx-mobl.gar.corp.intel.com
Writing AGGREGATOR certificate key pair to: /home/keerti/aggregator based worflow/cert/server
The CSR Hash 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
The CSR Hash for file server/agg_ktalwarx-mobl.gar.corp.intel.com.csr = 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85
Warning: manual check of certificate hashes is bypassed in silent mode.
Signing AGGREGATOR certificate
Traceback (most recent call last):
File "/home/keerti/aggregator based worflow/openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.py", line 14, in
Desktop:
- OS: WSL Ubuntu
- Python Version 3.8
- Openfl latest build
I can't seem to reproduce your issue. Can you provide some more information about your intelEnv environment. In particular, can you provide the output to python -m torch.utils.collect_env ?
Also, how did you install openfl? The error leads me to believe there may have been an issue with installation. Possible for you to try to just run:
import openfl.native as fx
fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')
in a fresh environment?
Output to python -m torch.utils.collect_env is as follows:
(env-latest-original-openfl) parth-wsl@parthmax-mobl1:~/env-latest-original-openfl/openfl$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] numpy 1.24.3 pypi_0 pypi
[conda] torch 1.13.1 pypi_0 pypi
[conda] torchvision 0.14.1 pypi_0 pypi
fx.init function throws the error when called from any tutorial notebook.
When debugged openfl/native/native.py file calls collaborator.create function (openfl/interface/collaborator.py) in line#203, when I checked in openfl/interface/collaborator.py file, there is no create function found. But there is create_ function.
To reproduce the error fetch the latest code from the develop branch.
Thanks, this is reproducible on the latest build. We are working to fix this
I have this issue as of now March 2024, was there any solution. I've been googling for days
PR #835 is still open. You can installing from the kta-intel:fx-init fork directly, which has a fix, or you can try using the task runner CLI
Thanks for getting back to me Kevin, I appreciate the help. I’ll try the fork with the fix, or fallback to the task runner method as you suggest.
Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***
From: Kevin Ta @.> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)
PR #835https://github.com/securefederatedai/openfl/pull/835 is still open. You can installing from the kta-intel:fx-inithttps://github.com/kta-intel/openfl/tree/fx-init fork directly, which has a fix, or you can try using the task runner CLIhttps://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach
— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2018817613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM. You are receiving this because you commented.Message ID: @.@.>>
Kevin, thank you for the help earlier.
I started a fresh Ubuntu install from scratch. But ran across the same issue. I installed from kta-intel/openfl at fx-init (github.com)https://github.com/kta-intel/openfl/tree/fx-init git clone and build. However, the error persists.
Examples for Running a Federation Task Runner API: Federated PyTorch MNISThttps://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist
The sample code: NOTE: All imports work without issue.
-
import numpy as np
-
import torch
-
import torch.nn as nn
-
import torch.nn.functional as F
-
import torch.optim as optim
-
import torchvision
-
import torchvision.transforms as transforms
-
import openfl.native as fx
-
from openfl.federated import FederatedModel,FederatedDataSet
-
#Setup default workspace, logging, etc.
-
fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') The Error:
-
Traceback (most recent call last):
-
File "task_runner.py", line 15, in
-
fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') -
File "/home/mark/projects/OpenFL/openfl/openfl/native/native.py", line 203, in init
-
collaborator.create( -
AttributeError: module 'openfl.interface.collaborator' has no attribute 'create' Not sure if the information above is enough to know offhand why this is still occurring or not.
This is simply following the intro instructions on the OpenFL official website. The same site customers use, I’, very concerned. Task Runner API: Federated PyTorch MNIST — OpenFL 2024.2 documentationhttps://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist
Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***
From: Kevin Ta @.> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)
PR #835https://github.com/securefederatedai/openfl/pull/835 is still open. You can installing from the kta-intel:fx-inithttps://github.com/kta-intel/openfl/tree/fx-init fork directly, which has a fix, or you can try using the task runner CLIhttps://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach
— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2018817613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM. You are receiving this because you commented.Message ID: @.@.>>
Can you try installing from the fx-init branch?
git clone https://github.com/kta-intel/openfl.git
cd openfl
git checkout fx-init
pip install .
Will do sir.
Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***
From: Kevin Ta @.> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)
Can you try installing from the fx-init branch?
git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install .
— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2023962956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY. You are receiving this because you commented.Message ID: @.@.>>
Ah, very nice, that indeed fixed the issue. I know this is probably some simple stuff to you, but coming from firmware development I am not at all familiar with OpenFL feature.
I wish I could pick your brain and understand what the issue was in code, and learn a bit about the OpenFL from an Intel expert, However, I don’t have the time sadly.
Thank you for all the help. You have no idea how much I appreciate the prompt responses to my emails. And solutions you’ve provided.
Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***
From: Kevin Ta @.> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)
Can you try installing from the fx-init branch?
git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install .
— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2023962956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY. You are receiving this because you commented.Message ID: @.@.>>
Glad we could resolve the issue! Please feel free to reach out anytime. Always happy to help and answer any questions
Indeed. I've been off and running since the solution yesterday, much code, and models being deployed without additional issues.
Thank you again for the help.
Mark McCawley
Federal Software Solutions
IFL | Office of CTO
Phone:1+ (503) 712-7128
[cid:fc3e803a-cc37-4ea0-a322-22e22a61452b]
From: Kevin Ta @.> Sent: Friday, March 29, 2024 10:10 AM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)
Glad we could resolve the issue! Please feel free to reach out anytime. Always happy to help and answer any questions
— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2027505370, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYTB3YAYNIWQGSDK22TY2WN6RAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGUYDKMZXGA. You are receiving this because you commented.Message ID: @.***>