robotsp
robotsp
I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?
> What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like `del optimizer` in `megatron/training.py`)....
Actually, the use case is required to change the original neural network structure and that's why I want to release the model and optimizer memory from the original one. I...
@deepakn94 One more question, I found I trained the model in the second time after the dummy first training, its loss curve is different from that one in the training...
> Yes. What you did is hard label distillation @kkeleve
Same here. @linhkid @sugeeth14
> @linhkid @myleott Did you solve the problem? I have the same issue here.
@scarydemon2 I have the same problem. Do we need to modify the code in reward_model `forward_value` function?
@ibtiRaj have you solve your problem?
> @robotsp No, I didn't, I'm sorry. No worries. BTW, may I ask the model file and vocab file in your configs, are they the same as the original ones...