kideng

Results 3 issues of kideng

### Current Behavior I deployed the wandb server locally. It prompted me to upgrade, so I pulled the latest version, restarted, and got the following bug. Here you can see...

app

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) 原因是GLMModel类中 `...

When running the example script(ZeRO3 optim offload), I encountered a ray.exceptions.OutOfMemoryError during training. Specifically, this error occurred at the 32nd step when the gradient_accumulation_steps was set to 32. 1600GB memory...