robotsp

Results 18 issues of robotsp

### Is your feature request related to a problem? Please describe. _No response_ ### Solutions 望作者提供预训练代码 ### Additional context _No response_

**Your question** how can we profile bubble time in pipeline parallelism accurately?

**Your question** Ask a clear and concise question about Megatron-LM. I saw there is a microbatch-level checkpointing implementation of https://arxiv.org/pdf/2205.05198.pdf in schedules.py. But I do not know how to enable...

**Your question** How to release the model and optimizer memory manually? **What I have tried** - set zero_grad() method - set None method - del method - gc.collect() - torch.cuda.empty_cached()...

stale

**Your question** Ask a clear and concise question about Megatron-LM. How to initialize process group twice in one torch.run bash I tried to destroy the original one and reinitialize it...

stale

I checked there is a pretrained model in repo "https://github.com/rewicks/ersatz-models/tree/main/monolingual/en". As I cannot find the tokenizer Vocabulary, I am not sure how to finetune the existed model.

I tried to create a training dataset, but there is an error: File "dataset.py", line 271, in main() File "dataset.py", line 267, in main determiner=determiner) File "dataset.py", line 66, in...

@rewicks @mjpost Dear contributor, I came across an error when I train the model, Traceback (most recent call last): File "trainer.py", line 419, in main() File "trainer.py", line 93, in...