BhAem
BhAem
Why the contrastive regularization in the paper is different from the code?
What is the role of GRes? If I remove this module, what effect does it have on the result?
Why do hardware configurations only consider the number of CPU cores and the number of Accelerators, without taking into account the size of the server's memory?
In deepspeed.utils.timer.py,what is the difference between the below items: `BACKWARD_MICRO_TIMER = 'bwd_microstep' BACKWARD_GLOBAL_TIMER = 'bwd' BACKWARD_INNER_MICRO_TIMER = 'bwd_inner_microstep' BACKWARD_INNER_GLOBAL_TIMER = 'bwd_inner' BACKWARD_REDUCE_MICRO_TIMER = 'bwd_allreduce_microstep' BACKWARD_REDUCE_GLOBAL_TIMER = 'bwd_allreduce'`? Can i choose...
` def _state(self, label_job_id, role="worker"): # whether this action selection leads to worker increment or ps increment # cluster_state = self.cluster.get_cluster_state() input = self.observe() # NN input label = np.zeros(pm.ACTION_DIM)...