gmryu

Results 61 comments of gmryu

You may try ``` from fairseq import checkpoint_utils checkpoint_utils.torch_persistent_save(state_dict, filename, async_write=false) print(f"Finished saving checkpoint to {filename}") ``` relative info: [checkpoint_utils](https://github.com/facebookresearch/fairseq/blob/eda703798dcfde11c1ee517805c27e8698285d71/fairseq/checkpoint_utils.py#L549) , [trainer's save_checkpoint](https://github.com/facebookresearch/fairseq/blob/eda703798dcfde11c1ee517805c27e8698285d71/fairseq/trainer.py#L433-L446) Is this what you need?

@robotsp I guess that will happen since a model state dict is not a checkpoint for fairseq. I found you can use torch.save instead. The flow is you load the...

@robotsp I guess at this point, you do not load as a model but load the checkpoint itself. ``` ckpt_state_dict=torch.load("{your old model checkpoint.pt}") print(ckpt_state_dict) # it should have something like...

@robotsp I gave the code before. Please at least tell me what happen when you tried these. I will just do this again if you did not see that or...

@robotsp I have not pruned or distilled any models. But if pytorch provide methods to prune a module, I believe you can cast that method upon the ["model"]["your weight"]. (It...

@robotsp About that `dummy_input`, it is an example of acceptable input for that model. Like for blenderbot in huggingface, it wrote: ``` @property def dummy_inputs(self): pad_token = self.config.pad_token_id input_ids =...

@robotsp sorry, I cannot reply you during my work time. (Also, have you actually read all I wrote? Is my phrasing too bad to understand?) Nonetheless, you may need to...

@robotsp if you want to freeze a tensor from updating, I guess the simple way is to freeze it in the code directly. For example, you can copy a TransformerModel...

@robotsp check out this: https://discuss.pytorch.org/t/how-do-i-freeze-the-specific-weights-in-a-layer/104722/5

If you want to only freeze some slice of a single weight tensor, in short answer it is difficult and you need to modify fairseq-train. The previous link talks about...