tangchb

Results 5 comments of tangchb

> Hi, I have never encountered this issue. `colossalai run` is a wrapper for `torchrun` such that we can launch multi-node training with on one node. You can try to...

> Hi, I have never encountered this issue. `colossalai run` is a wrapper for `torchrun` such that we can launch multi-node training with on one node. You can try to...

Hello,I have other questions, in partition_parameters.py, has funcation apply_with_gather(), there are similar codes of dist.broadcast (param.data, 0, group = param.ds_process_group), isn't this okay?

> @guoday Thanks for the details above. It was quite helpful. One follow up question. > > Do you take care of cycles that may appear in the dependency graph...

You can read _extra_state with code like this instead of state.read(). this can show _extra_state. ``` python if isinstance(state, io.BytesIO): state.seek(0) state = torch.load(state) ```