robotsp

Results 51 comments of robotsp

I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?

> What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like `del optimizer` in `megatron/training.py`)....

Actually, the use case is required to change the original neural network structure and that's why I want to release the model and optimizer memory from the original one. I...

@deepakn94 One more question, I found I trained the model in the second time after the dummy first training, its loss curve is different from that one in the training...

> Yes. What you did is hard label distillation @kkeleve

Same here. @linhkid @sugeeth14

> @linhkid @myleott Did you solve the problem? I have the same issue here.

@scarydemon2 I have the same problem. Do we need to modify the code in reward_model `forward_value` function?

@ibtiRaj have you solve your problem?

> @robotsp No, I didn't, I'm sorry. No worries. BTW, may I ask the model file and vocab file in your configs, are they the same as the original ones...