Can client_num_in_total only be set to 4 in FedGraphGNN's Link prediction?
When client_num_in_total is set to 4, it can be trained normally, and the evaluation index is also normal, but when I set it to other numbers (such as 6), the evaluation index will be very abnormal, Test MAE = 20714260267008.0, mae = 20714260267008.0, rmse = 20714260267008.0, mse = 4.290806749939617 e+26, obviously unreasonable。 The error message is as follows: `======== FedML (https://fedml.ai) ======== FedML version: 0.7.286 Execution path:/root/miniconda3/lib/python3.8/site-packages/fedml/init.py
======== Running Environment ======== OS: Linux-5.4.0-96-generic-x86_64-with-glibc2.17 Hardware: x86_64 Python version: 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0] PyTorch version: 1.11.0 MPI4py is installed
======== CPU Configuration ======== The CPU usage is : 6% Available CPU Memory: 332.5 G / 376.05326080322266G
======== GPU Configuration ========
NVIDIA GPU Info: <pynvml.nvml.LP_struct_c_nvmlDevice_t object at 0x7f74b73674c0>
Available GPU memory: 10.8 G / 11.0G
[]
args.client_id_list = None
args.client_id_list is not None
Epoch = 0, Iter = 1/1: Test score = 3.0601377487182617
Current best = 0
Epoch = 1, Iter = 1/1: Test score = 14.168010711669922
Current best = 0
Epoch = 2, Iter = 1/1: Test score = 222.92408752441406
Current best = 0
Epoch = 3, Iter = 1/1: Test score = 179197.703125
Current best = 0
Epoch = 4, Iter = 1/1: Test score = 82857024290816.0
Current best = 0
Epoch = 0, Iter = 1/1: Test score = inf
Current best = 0
[FedML-Server(0) @device-id-0] [Fri, 05 Aug 2022 13:37:13] [ERROR] [mlops_runtime_log.py:34:handle_exception] Uncaught exception
Traceback (most recent call last):
File "fedml_subgraph_link_prediction.py", line 84, in
Hello,
May I ask what are your hyperparameters? Is it possible to try LR = 0.005. I believe that the problem may be caused because of having high learning rate like 0.01.
We're investigating it though.
Thanks!
Hello,
May I ask what are your hyperparameters? Is it possible to try LR = 0.005. I believe that the problem may be caused because of having high learning rate like 0.01.
We're investigating it though.
Thanks!
Hello,
You are right! Now it works!
Thanks!
Hello,
May I ask what are your hyperparameters? Is it possible to try LR = 0.005. I believe that the problem may be caused because of having high learning rate like 0.01.
We're investigating it though.
Thanks!
Hello, I would like to ask a question about the partitioning of the ciao dataset. The first question: There are 28 categories of items in the ciao dataset. If I create 28 clients, does the item representing each client correspond to a category of items? For example, client 0, including items of category 0, and users interacting with it, constitute the local subgraph of client 1. The second question: the id of the client, does the id of the client correspond to the category of the item, for example: if the current ciao dataset has the 2nd and 10th categories (extracted from the original ciao dataset), then client 0 represents the 2nd category Class item, client 1 represents the 10th class item, is my understanding correct? Hope to hear back, thanks!
Please refer to #464
Issue has been resolved.