Challenge icon indicating copy to clipboard operation
Challenge copied to clipboard

Migrating TaskRunner based FeTS Task_1 Challenge to Workflow API

Open kagrawa2 opened this issue 9 months ago • 5 comments

As part of this PR, we are migrating Task_1 challenge to be based on Workflow API Interface. For more details on Workflow API , refer this : https://openfl.readthedocs.io/en/latest/about/features_index/workflowinterface.html

We are running experiment with LocalRuntIme (https://openfl.readthedocs.io/en/latest/about/features_index/workflowinterface.html#localruntime)

Python(3.10 - 3.13) is now the supported version and OpenFL is also upgraded to 1.7.1. Accordingly we have upgraded GaNDLF to version 0.1.0

Open Issues :

  1. Running experiment with "Ray" backend fails in the Federated Flow "end" step.
  2. Data Loaders returned from GaNDLF cannot be passed as private attributes of collaborators due to the deep-copy issue.
  3. Currently pre trained model is not compatible with GaNDLF version 0.1.0 ( Because of this issue : https://github.com/mlcommons/GaNDLF/pull/897#issuecomment-2218557301 )

Testing :

Tested the changes locally by running the experiment with single process on CPU. Testing the changes locally by running the experiment with single process on GPU.

Significant speedup observed with Ray backend on 4 GPUs. Execution time reduced by ~75% compared to TaskRunner.

image

kagrawa2 avatar Mar 24 '25 18:03 kagrawa2

Thanks for the PR!

@Linardos: would it be possible for you to run through this branch with your existing FeTS Challenge setup to check if it works as expected? On a related note, we should put together a few unit tests for this.

sarthakpati avatar Mar 25 '25 00:03 sarthakpati

Thanks for this @kagrawa2 !

Regarding

Data Loaders returned from GaNDLF cannot be passed as private attributes of collaborators due to the deep-copy issue

Do you happen to have any insights into what is specifically causing this issue? For simulation purposes, it is likely fine, but this seems like it could be a big issue in the long run if participants can potentially access another participant's dataloader.

I've also seen deep copy issues come up in the past (unrelated to GaNDLF), so it may be worth tracking and addressing at some point in a more generic sense, too

kminhta avatar Apr 14 '25 15:04 kminhta

Thanks, I will sync offline with @Linardos to expedite the test + merge.

sarthakpati avatar Apr 16 '25 19:04 sarthakpati

Hi @kagrawa2: this PR needs a bit of TLC to fix the merge conflicts after #201

sarthakpati avatar Apr 25 '25 16:04 sarthakpati

Hi @kagrawa2: this PR needs a bit of TLC to fix the merge conflicts after #201

@sarthakpati I will look into it and rebase.

kagrawa2 avatar Apr 29 '25 07:04 kagrawa2