li-yang23

Results 4 comments of li-yang23

`model.layers.xx.mlp.calculator.experts.parametrizations.weight.original0ate.xx` is not described in `pytorch_model.bin.index.json`, configuration problems maybe?

Thank you for your reply. I use transformers 4.47.1 and python 3.11.6 The same problem happened to both Llama-MoE-v1-3_0B-2_16 and Llama-MoE-v1-3_5B-2_8, so I printed them both. the structure of Llama-MoE-v1-3_0B-2_16...

I found the potential problem is the `configure_optimizers` function, I changed ```python optimizer = torch.optim.AdamW(self.llm.model.parameters(), lr=0.0002, weight_decay=0.0, betas=(0.9, 0.95)) scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lambda step: step / warmup_steps) ``` to ```python...