federated database is locked

When I try to increase the train_clients_per_round ,there is an error is that database is locked.I then individually tested the clients with each error and found that they were accessible.

Jun 25 '22 07:06 aixiangwang

Hi @aixiangwang. Can you provide the information requested on the new bug template, including:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Python package versions (e.g., TensorFlow Federated, TensorFlow)
A minimal reproduction of the bug.

This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.

Jun 27 '22 16:06 zcharles8

Hi @aixiangwang. Can you provide the information requested on the new bug template, including:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Python package versions (e.g., TensorFlow Federated, TensorFlow)

A minimal reproduction of the bug.

This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.

Thank you for your reply.I use tensorflow2.8 and tensorflow-federated0.21 running on the Windows 10. you can see client_id = 'f3928_39' in the error appear in lastest 10 randomly selected clients,that make the error.But I found client_id = 'f3928_39' have happened several times before.So I'm very confused.Looking forward to your reply!

Jun 28 '22 02:06 aixiangwang

Could you add the full code that actually causes this bug? Even better, if you can narrow it down to a smaller reproduction of it, that'd be really helpful.

Jun 28 '22 15:06 zcharles8

The complete code example is /tensorflow_federated/ simple_fedavg in the current Github project path.I just increased the train_clients_per_round parameter in emnist_fedavg_main.py from 2 to 10, which represents the number of clients sampled per round.

Some of the key code in the example is shown below： train_data, test_data = get_emnist_dataset()

def tff_model_fn(): """Constructs a fully initialized model for use in federated averaging.""" keras_model = create_original_fedavg_cnn_model(only_digits=True) loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) metrics = [tf.keras.metrics.SparseCategoricalAccuracy()] return tff.learning.from_keras_model( keras_model, loss=loss, metrics=metrics, input_spec=train_data.element_type_structure)

iterative_process = simple_fedavg_tff.build_federated_averaging_process( tff_model_fn, server_optimizer_fn, client_optimizer_fn) server_state = iterative_process.initialize() keras_model = create_original_fedavg_cnn_model(only_digits=True) for round_num in range(FLAGS.total_rounds): sampled_clients = np.random.choice( train_data.client_ids, size=FLAGS.train_clients_per_round, replace=False) print(sampled_clients) sampled_train_data = [ train_data.create_tf_dataset_for_client(client) for client in sampled_clients ] server_state, train_metrics = iterative_process.next( server_state, sampled_train_data) print(f'Round {round_num}') print(f'\tTraining metrics: {train_metrics}') if round_num % FLAGS.rounds_per_eval == 0: server_state.model.assign_weights_to(keras_model) accuracy = evaluate(keras_model, test_data) print(f'\tValidation accuracy: {accuracy * 100.0:.2f}%')

As shown in the figure below, the dataset loaded successfully and went through several iterations successfully. However, an error occurs at a later turn, as shown in the figure because the database is locked and the next federated procedure fails. Looking forward to your reply!

Aug 01 '22 08:08 aixiangwang

@aixiangwang I have not been able to repro this issue, and we have seen no other reports about this.

I suspect this might be something about the environment you are executing in. A similar type of error occurred in https://github.com/tensorflow/federated/issues/3479, and was because the user had cached the dataset to a locked directory.

Mar 16 '23 15:03 zcharles8

federated federated copied to clipboard

database is locked

federated
federated copied to clipboard