DL2 icon indicating copy to clipboard operation
DL2 copied to clipboard

a deep learning-driven scheduler for elastic training in deep learning clusters

Results 4 DL2 issues
Sort by recently updated
recently updated
newest added

In train.py, I see a central agent,SL agent and RL agents. They are running in different CPU cores with multiprocessing package. And RL agents get the weights of policy and...

Hello, I've been studying your dl2 paper and code recently, and I have a few questions I'd like to ask you! Currently, my environment is Python 2.7 and TensorFlow-GPU 1.15....

Can someone explain how to deal with the last state in ith scheduling and the 1st state in i+1 th scheduling? how to combine them to be a RL sample...

` def _state(self, label_job_id, role="worker"): # whether this action selection leads to worker increment or ps increment # cluster_state = self.cluster.get_cluster_state() input = self.observe() # NN input label = np.zeros(pm.ACTION_DIM)...