Shenggui Li

Results 142 comments of Shenggui Li

Possible, in fact we plan to use it to parallelize training as well. We would integrate it with Colossal-AI upon the official release of torch 2.0.

This is expected because `torchrun` cannot do graceful shutdown for now. It is not a bug.

This code only runs 20 steps on purpose since it is only a demo. You should be able to find in code that the the program will be stopped after...

@feifeibear do we still need this CPU process group? Many users encounter environment issues when init this group.

Hi @flymin , I have set up a PR #2374 to enable runtime build to reduce the frustration during installation. Hope it can help.

You can directly download ColossalAI from https://www.colossalai.org/download . We use our own pip source, so don't do `pip install colossalai` to install from the public PyPI. If you have issues...

Hi, can you print the type of the input `text`?

This issue has been stale for a long time. Global batch size = data parallel size * num_micro_batch * micro_batch_size.

It will be great if multiprocessing can be supported. I think integrating with `pytest-cov` will be a good move.