Shenggui Li
Shenggui Li
Possible, in fact we plan to use it to parallelize training as well. We would integrate it with Colossal-AI upon the official release of torch 2.0.
This is expected because `torchrun` cannot do graceful shutdown for now. It is not a bug.
This code only runs 20 steps on purpose since it is only a demo. You should be able to find in code that the the program will be stopped after...
@feifeibear do we still need this CPU process group? Many users encounter environment issues when init this group.
Hi @flymin , I have set up a PR #2374 to enable runtime build to reduce the frustration during installation. Hope it can help.
Should this PR be closed?
You can directly download ColossalAI from https://www.colossalai.org/download . We use our own pip source, so don't do `pip install colossalai` to install from the public PyPI. If you have issues...
Hi, can you print the type of the input `text`?
This issue has been stale for a long time. Global batch size = data parallel size * num_micro_batch * micro_batch_size.
It will be great if multiprocessing can be supported. I think integrating with `pytest-cov` will be a good move.