federated
federated copied to clipboard
database is locked
When I try to increase the train_clients_per_round ,there is an error is that database is locked.I then individually tested the clients with each error and found that they were accessible.
Hi @aixiangwang. Can you provide the information requested on the new bug template, including:
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python package versions (e.g., TensorFlow Federated, TensorFlow)
- A minimal reproduction of the bug.
This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.
Hi @aixiangwang. Can you provide the information requested on the new bug template, including:
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python package versions (e.g., TensorFlow Federated, TensorFlow)
- A minimal reproduction of the bug.
This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.
Thank you for your reply.I use tensorflow2.8 and tensorflow-federated0.21 running on the Windows 10.
you can see client_id = 'f3928_39' in the error appear in lastest 10 randomly selected clients,that make the error.But I found client_id = 'f3928_39' have happened several times before.So I'm very confused.Looking forward to your reply!
Could you add the full code that actually causes this bug? Even better, if you can narrow it down to a smaller reproduction of it, that'd be really helpful.
The complete code example is /tensorflow_federated/ simple_fedavg in the current Github project path.I just increased the train_clients_per_round parameter in emnist_fedavg_main.py from 2 to 10, which represents the number of clients sampled per round.
Some of the key code in the example is shown below: train_data, test_data = get_emnist_dataset()
def tff_model_fn(): """Constructs a fully initialized model for use in federated averaging.""" keras_model = create_original_fedavg_cnn_model(only_digits=True) loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) metrics = [tf.keras.metrics.SparseCategoricalAccuracy()] return tff.learning.from_keras_model( keras_model, loss=loss, metrics=metrics, input_spec=train_data.element_type_structure)
iterative_process = simple_fedavg_tff.build_federated_averaging_process( tff_model_fn, server_optimizer_fn, client_optimizer_fn) server_state = iterative_process.initialize() keras_model = create_original_fedavg_cnn_model(only_digits=True) for round_num in range(FLAGS.total_rounds): sampled_clients = np.random.choice( train_data.client_ids, size=FLAGS.train_clients_per_round, replace=False) print(sampled_clients) sampled_train_data = [ train_data.create_tf_dataset_for_client(client) for client in sampled_clients ] server_state, train_metrics = iterative_process.next( server_state, sampled_train_data) print(f'Round {round_num}') print(f'\tTraining metrics: {train_metrics}') if round_num % FLAGS.rounds_per_eval == 0: server_state.model.assign_weights_to(keras_model) accuracy = evaluate(keras_model, test_data) print(f'\tValidation accuracy: {accuracy * 100.0:.2f}%')
As shown in the figure below, the dataset loaded successfully and went through several iterations successfully.
However, an error occurs at a later turn, as shown in the figure because the database is locked and the next federated procedure fails.
Looking forward to your reply!
@aixiangwang I have not been able to repro this issue, and we have seen no other reports about this.
I suspect this might be something about the environment you are executing in. A similar type of error occurred in https://github.com/tensorflow/federated/issues/3479, and was because the user had cached the dataset to a locked directory.