federated
federated copied to clipboard
[distributed_dp] Including package versions into the requirements file
Hi everyone,
First on all, thank you very much for providing the very nice distributed_dp package.
I was trying to get it to work, and installed the packages referenced in https://github.com/google-research/federated/blob/master/distributed_dp/requirements.txt. Unfortunately, even though I installed the nightly build versions of all the packages as indicated in the README, there seem to be compatibility issues.
I've tried a couple of different combinations of versions for tf, tf-federated, tf-privacy, tf-estimator, but the code was running in none of them.
My current setup is
...
python 3.9.7 h12debd9_1
keras-nightly 2.9.0.dev2022030808 pypi_0 pypi
tb-nightly 2.9.0a20220307 pypi_0 pypi
tensorboard 2.8.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.6.0 py_0
tensorflow-datasets 4.5.2 pypi_0 pypi
tensorflow-federated-nightly 0.19.0.dev20220218 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.24.0 pypi_0 pypi
tensorflow-metadata 1.7.0 pypi_0 pypi
tensorflow-model-optimization 0.7.1 pypi_0 pypi
tensorflow-privacy 0.7.3 pypi_0 pypi
tensorflow-probability 0.15.0 pypi_0 pypi
tf-estimator-nightly 2.9.0.dev2022030809 pypi_0 pypi
tf-nightly 2.9.0.dev20220308 pypi_0 pypi
...
In this setup, I get the error
Traceback (most recent call last):
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 28, in <module>
from distributed_dp import fl_utils
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_utils.py", line 22, in <module>
from distributed_dp import accounting_utils
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/accounting_utils.py", line 21, in <module>
import tensorflow_privacy as tfp
File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/__init__.py", line 30, in <module>
from tensorflow_privacy import v1
File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/v1/__init__.py", line 32, in <module>
from tensorflow_privacy.privacy.estimators.v1.dnn import DNNClassifier as DNNClassifierV1
File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/privacy/estimators/v1/dnn.py", line 19, in <module>
from tensorflow_privacy.privacy.estimators.v1 import head as head_lib
File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/privacy/estimators/v1/head.py", line 22, in <module>
from tensorflow.python.ops import lookup_ops # pylint: disable=g-direct-tensorflow-import
ImportError: cannot import name 'lookup_ops' from 'tensorflow.python.ops' (unknown location)
when running bazel run :fl_run
My question now is the following: could you share version numbers in your requirement.txt file for which the code is successfully running?
Hi @fraboeni,
Thanks for your interest! I just tried locally cloning the repo and starting a new conda environment, and I was able to get it running using the following commands:
conda create -n tff python=3.9
conda activate tff
pip install -r requirements.txt # inside `distributed_dp/`
pip install tensorflow-addons
bazel run :fl_run # the example command for EMNIST
The specific versions of the related packages:
...
python 3.9.7 h88f2d9e_1
tensorboard 2.8.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow 2.8.0 pypi_0 pypi
tensorflow-addons 0.16.1 pypi_0 pypi
tensorflow-datasets 4.5.2 pypi_0 pypi
tensorflow-estimator 2.8.0 pypi_0 pypi
tensorflow-federated 0.20.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.24.0 pypi_0 pypi
tensorflow-metadata 1.7.0 pypi_0 pypi
tensorflow-model-optimization 0.7.1 pypi_0 pypi
tensorflow-privacy 0.7.3 pypi_0 pypi
tensorflow-probability 0.16.0 pypi_0 pypi
tf-estimator-nightly 2.8.0.dev2021122109 pypi_0 pypi
...
It seems that nightly builds are not needed but you would need tensorflow-addons
which was not specified in requirements.txt
. Could you try and see if the above works?
Thank you so much for your help with that @kenziyuliu. The installation worked just fine.
Now, I am running into different errors: I ran bazel run :fl_run
and got
INFO: Analyzed target //distributed_dp:fl_run (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //distributed_dp:fl_run up-to-date:
bazel-bin/distributed_dp/fl_run
INFO: Elapsed time: 0.120s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
E0310 17:25:35.563270 139699288257280 optimizer_utils.py:264] Unknown optimizer [None], known optimziers are [['sgd', 'adagrad', 'adam', 'yogi', 'lars', 'lamb', 'shampoo']]. To add support for an optimizer, add the optimzier class to the utils_impl._SUPPORTED_OPTIMIZERS list.
Traceback (most recent call last):
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 290, in <module>
app.run(main)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 185, in main
client_optimizer_fn = optimizer_utils.create_optimizer_fn_from_flags('client')
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/utils/optimizers/optimizer_utils.py", line 269, in create_optimizer_fn_from_flags
raise ValueError('`{!s}` is not a valid optimizer for flag --{!s}, must be '
ValueError: `None` is not a valid optimizer for flag --client_optimizer, must be one of ['sgd', 'adagrad', 'adam', 'yogi', 'lars', 'lamb', 'shampoo']. See error log for details.
The issue did not occur when specifying the flags as in your example:
bazel run :fl_run -- \
--task=emnist_character \
--server_optimizer=sgd \
--server_learning_rate=1 \
--server_sgd_momentum=0.9 \
--client_optimizer=sgd \
--client_learning_rate=0.03 \
--client_batch_size=20 \
--experiment_name=my_emnist_test \
--epsilon=10 \
--l2_norm_clip=0.03 \
--dp_mechanism=ddgauss \
--logtostderr
This started very promising, then I got a different error:
I0310 17:29:00.706568 139991547269888 fl_utils.py:71] Shared DP Parameters:
I0310 17:29:00.706730 139991547269888 fl_utils.py:72] {'clip': 0.03,
'delta': 0.0002941176470588235,
'dim': 1018174,
'epsilon': 10.0,
'mechanism': 'ddgauss',
'num_clients': 3400,
'num_clients_per_round': 100,
'num_rounds': 1500,
'sampling_rate': 0.029411764705882353}
I0310 17:30:57.426323 139991547269888 fl_utils.py:151] ddgauss parameters:
I0310 17:30:57.426513 139991547269888 fl_utils.py:152] {'beta': 0.6065306597126334,
'bits': 16,
'dim': 1018174,
'gamma': 3.292593044721554e-06,
'inflated_l2': 0.030049064475707276,
'k_stddevs': 4,
'local_stddev': 0.002681329925591648,
'mechanism': 'ddgauss',
'noise_mult_clip': 0.8937766418638827,
'noise_mult_inflated': 0.8923172725591274,
'padded_dim': 1048576.0,
'scale': 303711.99429067835}
I0310 17:30:57.426573 139991547269888 ddpquery_utils.py:44] Conditional rounding set to True (beta = 0.606531)
I0310 17:30:57.510118 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:30:57.510220 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:30:58.755060 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:30:58.755179 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:00.380089 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:00.380198 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.371215 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:02.371326 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.647132 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:02.647240 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.859875 139991547269888 training_utils.py:68] Writing...
I0310 17:31:02.859981 139991547269888 training_utils.py:69] program state to: /tmp/ddp_fl/checkpoints/my_emnist_test
I0310 17:31:02.860028 139991547269888 training_utils.py:70] CSV metrics to: /tmp/ddp_fl/results/my_emnist_test/experiment.metrics.csv
I0310 17:31:02.860080 139991547269888 training_utils.py:71] TensorBoard summaries to: /tmp/ddp_fl/logdir/my_emnist_test
I0310 17:31:02.860128 139991547269888 training_loop.py:189] Running training process
I0310 17:31:03.333363 139991547269888 training_loop.py:201] Initializing training process
I0310 17:31:03.397290 139991547269888 training_loop.py:115] Running evaluation at round 0
Traceback (most recent call last):
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 290, in <module>
app.run(main)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 274, in main
state = tff.simulation.run_training_process(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/simulation/training_loop.py", line 206, in run_training_process
evaluation_metrics = _run_evaluation(evaluation_fn,
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/simulation/training_loop.py", line 119, in _run_evaluation
evaluation_metrics = evaluation_fn(state, evaluation_data)
File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 270, in evaluation_fn
return federated_eval(state.model, evaluation_data)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/computation/computation_impl.py", line 119, in __call__
return context.invoke(self, arg)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/sync_execution_context.py", line 65, in invoke
return self._event_loop.run_until_complete(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/retrying.py", line 91, in retry_coro_fn
raise e
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/retrying.py", line 88, in retry_coro_fn
return await fn(*args, **kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/async_execution_context.py", line 300, in invoke
return await tracing.wrap_coroutine_in_current_trace_context(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
return await coro
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/async_execution_context.py", line 141, in _invoke
result = await executor.create_call(comp, arg)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 346, in create_call
return await comp_repr.invoke(self, arg)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 166, in invoke
return await executor._evaluate(comp_lambda.result, new_scope) # pylint: disable=protected-access
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 516, in _evaluate
return await self._evaluate_block(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 480, in _evaluate_block
return await self._evaluate(comp.block.result, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
return await self._evaluate_reference(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
return await scope.resolve_reference(comp.reference.name)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
return await value
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
return await self._evaluate_call(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
func, arg = await asyncio.gather(func, get_arg())
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
return await self._evaluate(comp.call.argument, scope=scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
return await self._evaluate_struct(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
values = await asyncio.gather(*values)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
return await self._evaluate_reference(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
return await scope.resolve_reference(comp.reference.name)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
return await value
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
return await self._evaluate_call(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
func, arg = await asyncio.gather(func, get_arg())
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
return await self._evaluate(comp.call.argument, scope=scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
return await self._evaluate_struct(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
values = await asyncio.gather(*values)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
return await self._evaluate_reference(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
return await scope.resolve_reference(comp.reference.name)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
return await value
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
return await self._evaluate_call(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
func, arg = await asyncio.gather(func, get_arg())
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
return await self._evaluate(comp.call.argument, scope=scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
return await self._evaluate_reference(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
return await scope.resolve_reference(comp.reference.name)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
return await value
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
return await self._evaluate_call(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
func, arg = await asyncio.gather(func, get_arg())
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
return await self._evaluate(comp.call.argument, scope=scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
return await self._evaluate_struct(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
values = await asyncio.gather(*values)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
return await self._evaluate_reference(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
return await scope.resolve_reference(comp.reference.name)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
return await value
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
return await self._evaluate_call(comp, scope)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 449, in _evaluate_call
return await self.create_call(func, arg=arg)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 342, in create_call
return ReferenceResolvingExecutorValue(await
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 125, in create_call
return await self._delegate(self._target_executor.create_call(comp, arg))
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 110, in _delegate
result_value = await _delegate_with_trace_ctx(coro, self._event_loop)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
return await coro
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federating_executor.py", line 457, in create_call
return await self._strategy.compute_federated_intrinsic(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federating_executor.py", line 143, in compute_federated_intrinsic
return await fn(arg) # pylint: disable=not-callable
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 458, in compute_federated_map
return await self._map(arg, all_equal=False)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 339, in _map
results = await asyncio.gather(*[
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 336, in _map_child
fn_at_child = await child.create_value(fn, fn_type)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 115, in create_value
return await self._delegate(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 110, in _delegate
result_value = await _delegate_with_trace_ctx(coro, self._event_loop)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
return await coro
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
result = await fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 683, in create_value
normalized_value = to_representation_for_type(value,
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
result = fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 519, in to_representation_for_type
return _to_computation_internal_rep(
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
result = fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 405, in _to_computation_internal_rep
embedded_fn = embed_tensorflow_computation(value, type_spec, device)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
result = fn(*fn_args, **fn_kwargs)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 273, in embed_tensorflow_computation
comp = _ensure_comp_runtime_compatible(comp)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 246, in _ensure_comp_runtime_compatible
_check_dataset_reduce_for_multi_gpu(graph_def)
File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 63, in _check_dataset_reduce_for_multi_gpu
raise ValueError(
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iterations. See https://www.tensorflow.org/federated/tutorials/simulations_with_accelerators for examples.
Tried fixing that with disabling GPU execution by inserting the following lines here: https://github.com/google-research/federated/blob/ed50f1e19c24086b480b7c5b85c6376a1a9ef1c6/distributed_dp/fl_run.py#L33 (following the tutorial: https://www.tensorflow.org/federated/tutorials/simulations_with_accelerators)
cpu_device = tf.config.list_logical_devices('CPU')[0]
tff.backends.native.set_local_python_execution_context(
server_tf_device=cpu_device, client_tf_devices=[cpu_device])
and simply re-ran the command.
However, the error stayed the same. Would I have to do some re-build, or can you recommend me another way to get rid of the error resulting from tff?
Thank you very much!
@fraboeni Can you see what happens if you try toggling this line: https://github.com/google-research/federated/blob/ed50f1e19c24086b480b7c5b85c6376a1a9ef1c6/distributed_dp/fl_run.py#L251
For context, the client training that is part of tff.learning.build_federated_averaging_process
can go in one of two ways depending on whether you set use_experimental_simulation_loop
to True or False. Generally, setting this to True is for multi-GPU simulations.
Also for context @kenziyuliu I believe the nightly TFF packages are currently broken. I believe that using the latest version is the recommended way to proceed (as in your comment above).
Thanks for your prompt answer @zcharles8!
Unfortunately, no matter if I set the indicated line to True or False, I still get the same error.
@fraboeni Is that true if you don't add the call to tff.backends.native.set_local_python_execution_context
that you described above?
For context, I just ran the command you posted above (purely on CPU) and it worked fine using the default executor.
Oh wait, I see the potential problem. @fraboeni It sounds like you are using a multi-GPU environment based on the error. If that is the case then you would need to alter this line: https://github.com/google-research/federated/blob/master/distributed_dp/fl_run.py#L266
In particular, set use_experimental_simulation_loop=True
, matching the argument in tff.learning.build_federated_averaging_process
. Let me know if that helps at all, and thanks for digging into this.
Thank you very much @zcharles8.
Unfortunately, passing the parameter in the line you indicated also does not solve the issue:
federated_eval = tff.learning.build_federated_evaluation(task.model_fn,use_experimental_simulation_loop=True)
I also tried switching off GPUs by
cpu_device = tf.config.list_logical_devices('CPU')[0]
tff.backends.native.set_local_python_execution_context(
server_tf_device=cpu_device, client_tf_devices=[cpu_device])
Or only using one GPU by that command. Unfortunately, nothing seems to change the error.
Hi @zcharles8, are there any news from your side on how we could make the code here run?
Hi @fraboeni, I tried following https://github.com/google-research/federated/issues/57#issuecomment-1062468566 on a single-GPU machine, and by default things seem to work fine.
Specifically, I followed https://github.com/google-research/federated/issues/57#issuecomment-1062468566, fixed the error in https://github.com/google-research/federated/issues/58, and checked that TF sees the GPU as
>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Running the example script from here seems to work (bazel run :fl_run -- ...
). If it's a multi-GPU issue, maybe try forcing a single GPU as a workaround via export CUDA_VISIBLE_DEVICES=0
. Hope this helps!
Can anyone help me solve the same issue while using tff.templates.IterativeProcess
instead of tff.learning.build_federated_averaging_process
?
Could you please expand more on where exactly you are doing? Are you creating a custom iterative process or using one that we are providing in the repo? Could you also please provide a snippet for the error you are seeing?