openfl icon indicating copy to clipboard operation
openfl copied to clipboard

Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. 

Open KeertiX opened this issue 2 years ago • 3 comments

Describe the bug Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. 

To Reproduce Run the tutorial openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb 

Expected behavior Tutorial should run successfully without any error. 

Screenshots Creating AGGREGATOR certificate key pair with following settings: CN=ktalwarx-mobl.gar.corp.intel.com, SAN=DNS:ktalwarx-mobl.gar.corp.intel.com Writing AGGREGATOR certificate key pair to: /home/keerti/aggregator based worflow/cert/server The CSR Hash 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85 The CSR Hash for file server/agg_ktalwarx-mobl.gar.corp.intel.com.csr = 60c9e4d7778ab8bc06444cc976cfb6c5b3ab1346f91c207593bdc6d7dedb102ae3ae80fd64978344afc597225d61bf85 Warning: manual check of certificate hashes is bypassed in silent mode. Signing AGGREGATOR certificate Traceback (most recent call last): File "/home/keerti/aggregator based worflow/openfl/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.py", line 14, in fx.init("torch_cnn_mnist", log_level="METRIC", log_file="./spam_metric.log") File "/home/keerti/ls/envs/intelEnv/lib/python3.10/site-packages/openfl/native/native.py", line 203, in init collaborator.create( AttributeError: module 'openfl.interface.collaborator' has no attribute 'create'. Did you mean: 'create_'?

Desktop:

  • OS: WSL Ubuntu
  • Python Version 3.8
  • Openfl latest build

KeertiX avatar May 30 '23 09:05 KeertiX

I can't seem to reproduce your issue. Can you provide some more information about your intelEnv environment. In particular, can you provide the output to python -m torch.utils.collect_env ?

Also, how did you install openfl? The error leads me to believe there may have been an issue with installation. Possible for you to try to just run:

import openfl.native as fx fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')

in a fresh environment?

kminhta avatar May 30 '23 22:05 kminhta

Output to python -m torch.utils.collect_env is as follows:

(env-latest-original-openfl) parth-wsl@parthmax-mobl1:~/env-latest-original-openfl/openfl$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.16 (default, Mar  2 2023, 03:21:46)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] torch                     1.13.1                   pypi_0    pypi
[conda] torchvision               0.14.1                   pypi_0    pypi

fx.init function throws the error when called from any tutorial notebook.

When debugged openfl/native/native.py file calls collaborator.create function (openfl/interface/collaborator.py) in line#203, when I checked in openfl/interface/collaborator.py file, there is no create function found. But there is create_ function.

To reproduce the error fetch the latest code from the develop branch.

ParthM-GitHub avatar Jun 01 '23 12:06 ParthM-GitHub

Thanks, this is reproducible on the latest build. We are working to fix this

kminhta avatar Jun 01 '23 16:06 kminhta

I have this issue as of now March 2024, was there any solution. I've been googling for days

mccawley74 avatar Mar 22 '24 21:03 mccawley74

PR #835 is still open. You can installing from the kta-intel:fx-init fork directly, which has a fix, or you can try using the task runner CLI

kminhta avatar Mar 25 '24 20:03 kminhta

Thanks for getting back to me Kevin, I appreciate the help. I’ll try the fork with the fix, or fallback to the task runner method as you suggest.

Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***

From: Kevin Ta @.> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)

PR #835https://github.com/securefederatedai/openfl/pull/835 is still open. You can installing from the kta-intel:fx-inithttps://github.com/kta-intel/openfl/tree/fx-init fork directly, which has a fix, or you can try using the task runner CLIhttps://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach

— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2018817613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM. You are receiving this because you commented.Message ID: @.@.>>

mccawley74 avatar Mar 25 '24 20:03 mccawley74

Kevin, thank you for the help earlier.

I started a fresh Ubuntu install from scratch. But ran across the same issue. I installed from kta-intel/openfl at fx-init (github.com)https://github.com/kta-intel/openfl/tree/fx-init git clone and build. However, the error persists.

Examples for Running a Federation Task Runner API: Federated PyTorch MNISThttps://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist

The sample code: NOTE: All imports work without issue.

  1. import numpy as np

  2. import torch

  3. import torch.nn as nn

  4. import torch.nn.functional as F

  5. import torch.optim as optim

  6. import torchvision

  7. import torchvision.transforms as transforms

  8. import openfl.native as fx

  9. from openfl.federated import FederatedModel,FederatedDataSet

  10. #Setup default workspace, logging, etc.

  11. fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log') The Error:

  12. Traceback (most recent call last):

  13. File "task_runner.py", line 15, in

  14.  fx.init('torch_cnn_mnist', log_level='METRIC', log_file='./spam_metric.log')
    
  15. File "/home/mark/projects/OpenFL/openfl/openfl/native/native.py", line 203, in init

  16.  collaborator.create(
    
  17. AttributeError: module 'openfl.interface.collaborator' has no attribute 'create' Not sure if the information above is enough to know offhand why this is still occurring or not.

This is simply following the intro instructions on the OpenFL official website. The same site customers use, I’, very concerned. Task Runner API: Federated PyTorch MNIST — OpenFL 2024.2 documentationhttps://openfl.readthedocs.io/en/latest/get_started/examples/taskrunner_pytorch_mnist.html#taskrunner-pytorch-mnist

Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***

From: Kevin Ta @.> Sent: Monday, March 25, 2024 1:07 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)

PR #835https://github.com/securefederatedai/openfl/pull/835 is still open. You can installing from the kta-intel:fx-inithttps://github.com/kta-intel/openfl/tree/fx-init fork directly, which has a fix, or you can try using the task runner CLIhttps://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#bare-metal-approach

— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2018817613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYUACILMMO4LMZDTQLTY2B7VZAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYHAYTONRRGM. You are receiving this because you commented.Message ID: @.@.>>

mccawley74 avatar Mar 27 '24 18:03 mccawley74

Can you try installing from the fx-init branch?

git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install .

kminhta avatar Mar 27 '24 20:03 kminhta

Will do sir.

Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***

From: Kevin Ta @.> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)

Can you try installing from the fx-init branch?

git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install .

— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2023962956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY. You are receiving this because you commented.Message ID: @.@.>>

mccawley74 avatar Mar 27 '24 21:03 mccawley74

Ah, very nice, that indeed fixed the issue. I know this is probably some simple stuff to you, but coming from firmware development I am not at all familiar with OpenFL feature.

I wish I could pick your brain and understand what the issue was in code, and learn a bit about the OpenFL from an Intel expert, However, I don’t have the time sadly.

Thank you for all the help. You have no idea how much I appreciate the prompt responses to my emails. And solutions you’ve provided.

Mark McCawley Federal Software Solutions IFL | Office of CTO Phone:1+ (503) 712-7128 @.***

From: Kevin Ta @.> Sent: Wednesday, March 27, 2024 1:52 PM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)

Can you try installing from the fx-init branch?

git clone https://github.com/kta-intel/openfl.git cd openfl git checkout fx-init pip install .

— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2023962956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYVZRSHYU27QFIY6OBTY2MWO7AVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRTHE3DEOJVGY. You are receiving this because you commented.Message ID: @.@.>>

mccawley74 avatar Mar 28 '24 16:03 mccawley74

Glad we could resolve the issue! Please feel free to reach out anytime. Always happy to help and answer any questions

kminhta avatar Mar 29 '24 17:03 kminhta

Indeed. I've been off and running since the solution yesterday, much code, and models being deployed without additional issues.

Thank you again for the help.

Mark McCawley

Federal Software Solutions

IFL | Office of CTO

Phone:1+ (503) 712-7128

[cid:fc3e803a-cc37-4ea0-a322-22e22a61452b]


From: Kevin Ta @.> Sent: Friday, March 29, 2024 10:10 AM To: securefederatedai/openfl @.> Cc: Mccawley, Mark A @.>; Comment @.> Subject: Re: [securefederatedai/openfl] Aggregator Based Workflow Tutorial Federated_Pytorch_MNIST_Tutorial.ipynb is not working. (Issue #834)

Glad we could resolve the issue! Please feel free to reach out anytime. Always happy to help and answer any questions

— Reply to this email directly, view it on GitHubhttps://github.com/securefederatedai/openfl/issues/834#issuecomment-2027505370, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFRMLYTB3YAYNIWQGSDK22TY2WN6RAVCNFSM6AAAAAAYTYDPHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGUYDKMZXGA. You are receiving this because you commented.Message ID: @.***>

mccawley74 avatar Mar 29 '24 17:03 mccawley74