nni icon indicating copy to clipboard operation
nni copied to clipboard

metrics are not shown in web

Open Jia-py opened this issue 1 year ago • 0 comments

Describe the issue: metrics are exist in the .nni/metrics file, but are not shown in the web UI.

Environment:

  • NNI version: 2.10.1
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: ubuntu
  • Server OS (for remote mode only):
  • Python version: 3.8
  • PyTorch/TensorFlow version: pytorch 1.11.0
  • Is conda/virtualenv/venv used?: conda
  • Is running in Docker?:

Configuration:

  • Experiment config (remember to remove secrets!):
  • Search space:

Log message:

  • nnimanager.log:
[2023-09-08 21:48:22] INFO (main) Start NNI manager
[2023-09-08 21:48:22] INFO (NNIDataStore) Datastore initialization done
[2023-09-08 21:48:22] INFO (RestServer) Starting REST server at port 8080, URL prefix: "/"
[2023-09-08 21:48:22] WARNING (NNITensorboardManager) Tensorboard may not installed, if you want to use tensorboard, please check if tensorboard installed.
[2023-09-08 21:48:22] INFO (RestServer) REST server started.
[2023-09-08 21:48:23] INFO (NNIManager) Starting experiment: ex5bd2uk
[2023-09-08 21:48:23] INFO (NNIManager) Setup training service...
[2023-09-08 21:48:23] INFO (LocalTrainingService) Construct local machine training service.
[2023-09-08 21:48:23] INFO (NNIManager) Setup tuner...
[2023-09-08 21:48:23] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2023-09-08 21:48:23] INFO (NNIManager) Add event listeners
[2023-09-08 21:48:23] INFO (LocalTrainingService) Run local machine training service.
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: ID, 
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.0001}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 21, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 13, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 3, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 4, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 16, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 5, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 6, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 18, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 21:48:23] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 7, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 16, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 0,
 hyperParameters: {
   value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.0001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 1,
 hyperParameters: {
   value: '{"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 21, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 2,
 hyperParameters: {
   value: '{"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 13, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 3,
 hyperParameters: {
   value: '{"parameter_id": 3, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 4,
 hyperParameters: {
   value: '{"parameter_id": 4, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 16, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 5,
 hyperParameters: {
   value: '{"parameter_id": 5, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 6,
 hyperParameters: {
   value: '{"parameter_id": 6, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 18, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:28] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 7,
 hyperParameters: {
   value: '{"parameter_id": 7, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 16, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 21:48:38] INFO (NNIManager) Trial job qMWBG status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job zcCbb status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job HCtVc status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job ABjjO status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job SP0Vs status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job sD2KG status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job HSWik status changed from WAITING to RUNNING
[2023-09-08 21:48:38] INFO (NNIManager) Trial job Wtx7j status changed from WAITING to RUNNING
[2023-09-08 22:15:26] INFO (NNIManager) Trial job HCtVc status changed from RUNNING to SUCCEEDED
[2023-09-08 22:15:26] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 8, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 18, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:15:31] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 8,
 hyperParameters: {
   value: '{"parameter_id": 8, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 18, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:15:36] INFO (NNIManager) Trial job cLUCA status changed from WAITING to RUNNING
[2023-09-08 22:17:19] INFO (NNIManager) Trial job Wtx7j status changed from RUNNING to SUCCEEDED
[2023-09-08 22:17:19] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 9, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 11, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:17:24] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 9,
 hyperParameters: {
   value: '{"parameter_id": 9, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 11, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:17:30] INFO (NNIManager) Trial job NvOlF status changed from WAITING to RUNNING
[2023-09-08 22:17:45] INFO (NNIManager) Trial job SP0Vs status changed from RUNNING to SUCCEEDED
[2023-09-08 22:17:45] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 10, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 19, "learning_rate": 0.0001}, "parameter_index": 0}
[2023-09-08 22:17:50] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 10,
 hyperParameters: {
   value: '{"parameter_id": 10, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 19, "learning_rate": 0.0001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:17:55] INFO (NNIManager) Trial job p9B3c status changed from WAITING to RUNNING
[2023-09-08 22:18:00] INFO (NNIManager) Trial job HSWik status changed from RUNNING to SUCCEEDED
[2023-09-08 22:18:00] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 11, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 11, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 22:18:05] INFO (NNIManager) Trial job sD2KG status changed from RUNNING to SUCCEEDED
[2023-09-08 22:18:05] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 11,
 hyperParameters: {
   value: '{"parameter_id": 11, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 11, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:18:05] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 12, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:18:10] INFO (NNIManager) Trial job TUgDG status changed from WAITING to RUNNING
[2023-09-08 22:18:10] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 12,
 hyperParameters: {
   value: '{"parameter_id": 12, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:18:15] INFO (NNIManager) Trial job GHMqA status changed from WAITING to RUNNING
[2023-09-08 22:18:30] INFO (NNIManager) Trial job zcCbb status changed from RUNNING to SUCCEEDED
[2023-09-08 22:18:30] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 13, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 12, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:18:35] INFO (NNIManager) Trial job ABjjO status changed from RUNNING to SUCCEEDED
[2023-09-08 22:18:35] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 13,
 hyperParameters: {
   value: '{"parameter_id": 13, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 12, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:18:35] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 14, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 14, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 22:18:41] INFO (NNIManager) Trial job IYhpN status changed from WAITING to RUNNING
[2023-09-08 22:18:41] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 14,
 hyperParameters: {
   value: '{"parameter_id": 14, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 14, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:18:46] INFO (NNIManager) Trial job EbPy0 status changed from WAITING to RUNNING
[2023-09-08 22:26:30] INFO (NNIManager) Trial job qMWBG status changed from RUNNING to SUCCEEDED
[2023-09-08 22:26:30] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 15, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 15, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:26:35] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 15,
 hyperParameters: {
   value: '{"parameter_id": 15, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 20, "k": 15, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:26:40] INFO (NNIManager) Trial job cxlmg status changed from WAITING to RUNNING
[2023-09-08 22:43:43] INFO (NNIManager) Trial job cLUCA status changed from RUNNING to SUCCEEDED
[2023-09-08 22:43:44] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 16, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 15, "learning_rate": 0.0001}, "parameter_index": 0}
[2023-09-08 22:43:49] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 16,
 hyperParameters: {
   value: '{"parameter_id": 16, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 15, "learning_rate": 0.0001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:43:54] INFO (NNIManager) Trial job LPQlu status changed from WAITING to RUNNING
[2023-09-08 22:44:25] INFO (NNIManager) Trial job NvOlF status changed from RUNNING to SUCCEEDED
[2023-09-08 22:44:25] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 17, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:44:30] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 17,
 hyperParameters: {
   value: '{"parameter_id": 17, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 17, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:44:35] INFO (NNIManager) Trial job KeQ1g status changed from WAITING to RUNNING
[2023-09-08 22:45:32] INFO (NNIManager) Trial job TUgDG status changed from RUNNING to SUCCEEDED
[2023-09-08 22:45:32] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 18, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 11, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:45:37] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 18,
 hyperParameters: {
   value: '{"parameter_id": 18, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 11, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:45:42] INFO (NNIManager) Trial job HdAvr status changed from WAITING to RUNNING
[2023-09-08 22:46:18] INFO (NNIManager) Trial job IYhpN status changed from RUNNING to SUCCEEDED
[2023-09-08 22:46:18] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 19, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}
[2023-09-08 22:46:23] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 19,
 hyperParameters: {
   value: '{"parameter_id": 19, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 30, "k": 19, "learning_rate": 0.0005}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:46:28] INFO (NNIManager) Trial job N1jtt status changed from WAITING to RUNNING
[2023-09-08 22:46:49] INFO (NNIManager) Trial job EbPy0 status changed from RUNNING to SUCCEEDED
[2023-09-08 22:46:49] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 20, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 21, "learning_rate": 0.001}, "parameter_index": 0}
[2023-09-08 22:46:54] INFO (NNIManager) submitTrialJob: form: {
 sequenceId: 20,
 hyperParameters: {
   value: '{"parameter_id": 20, "parameter_source": "algorithm", "parameters": {"fs_update_frequency": 15, "k": 21, "learning_rate": 0.001}, "parameter_index": 0}',
   index: 0
 },
 placementConstraint: { type: 'None', gpus: [] }
}
[2023-09-08 22:46:59] INFO (NNIManager) Trial job jZPGa status changed from WAITING to RUNNING
  • dispatcher.log:
[2023-09-08 21:48:23] INFO (nni.tuner.tpe/MainThread) Using random seed 1760022900
[2023-09-08 21:48:23] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2023-09-08 22:18:00] INFO (nni.common.hpo_utils.dedup/Thread-1) Tuning algorithm generated duplicate parameter: {('fs_update_frequency',): 0, ('k',): 5, ('learning_rate',): 1}
[2023-09-08 22:18:00] INFO (nni.common.hpo_utils.dedup/Thread-1) Use grid search for deduplication.
[2023-09-08 22:18:00] INFO (nni.tuner.gridsearch/Thread-1) Grid initialized, size: (3×12×3) = 108
  • nnictl stdout and stderr:

How to reproduce it?:

Jia-py avatar Sep 08 '23 14:09 Jia-py