robotsp comments

Results 51 comments of


                                            robotsp

[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ？

I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?

[QUESTION] How to release the model and optimizer memory manually?

> What's the use case for releasing the model memory? Trying to delete the optimizer object might help with releasing the optimizer memory (so something like `del optimizer` in `megatron/training.py`)....

[QUESTION] How to release the model and optimizer memory manually?

Actually, the use case is required to change the original neural network structure and that's why I want to release the model and optimizer memory from the original one. I...

[QUESTION] How to release the model and optimizer memory manually?

@deepakn94 One more question, I found I trained the model in the second time after the dummy first training, its loss curve is different from that one in the training...

Knowledge Distillation

> Yes. What you did is hard label distillation @kkeleve

Inferring a trained student model

Same here. @linhkid @sugeeth14

Cannot do inference on any trained models

> @linhkid @myleott Did you solve the problem? I have the same issue here.

use bloom-350m to train reward model in step2

@scarydemon2 I have the same problem. Do we need to modify the code in reward_model `forward_value` function?

Prepare new data for NLLB-200

@ibtiRaj have you solve your problem?

Prepare new data for NLLB-200

> @robotsp No, I didn't, I'm sorry. No worries. BTW, may I ask the model file and vocab file in your configs, are they the same as the original ones...