shileims
shileims
When I use the same booster setting for another model, it works very well. def main(args): if args.seed is None: colossalai.launch_from_torch(config={}) else: colossalai.launch_from_torch(config={}, seed=args.seed) global_rank = dist.get_rank() # local_rank =...
HI @JThh , Thanks for your reply. I would like to ask how to save and load lr_scheduler? I think the example you pointed out doesn't include lr_scheduler example. Thanks
Hi @JThh , Thanks for your reply. Another questions, I found the code you pointed out to me shows: The saved optimizer is loaded by only the main process (local_rank...
hi @JThh , really appreciate your help, thanks!
Hi @JThh , I used the following code to save optimizer, but it shows timeout error: **Code**: rank = dist.get_rank() mapping = dict() optim_state = optimizer.state_dict() for k, v in...
Hi @JThh , Thanks for your reply. Actually, I used the following function to define an optimizer: 1. from colossalai.nn.optimizer.gemini_optimizer import GeminiAdamOptimizer So I think it is a member of...
HI @JThh , While loading saved the optimizer, it shows the following error: **Error**: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group...
Hi @JThh , Thanks for your reply. The info is the same as what I listed at the beginning of the error: GPU1: saved optimizer['params'] [0, 1, 2, 3, 4,...
Hi @JThh , Sorry for late reply. I found sometimes, it works for saving and loading optimizer. But sometimes, it doesn't work. Normally, if I use 16 gpus to train...
Hi @JThh , I meet the same issue. WOuld you give me an example of saving ZeroOptimizer and loading ZeroOptimizer? Thanks