wanziyu

Results 6 issues of wanziyu

Currently, notebook is supported, it'll be good to add [code-server](https://github.com/coder/code-server) as well

## Description Please add more torch elastic training examples like bert model training in natural language processing. ## Motivation/Background We cannot find other torch elastic examples. ## Alternatives ## Additional...

Training large DL models on edge devices is infeasible due to their limited computing resources. In decentralized distributed deep learning system, workers exchange local gradients with each other, and update...

kind/question

### Ⅰ. Describe what this PR does The PR designs elastic training APIs, adds a torch-elastic controller and implements elastic training control flow on torch-elastic controller and pytorch controller. Currently,...

In train.py, I see a central agent,SL agent and RL agents. They are running in different CPU cores with multiprocessing package. And RL agents get the weights of policy and...

### PR types feature support ### PR changes OPs ### Describe Support distributed job in pipeline

contributor
status: proposed