Nan Zhang issues

Results 5 issues of


                                            Nan Zhang

Customized loss value

full parameter update里面，我最近在试一种新的loss function，就是在原有的next token prediction上面加一个regularized term，希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error：比如在lomo_trainer.py中： ```python lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32) self.model.train() for name, param in self.model.named_parameters(): if "self_attn.q_proj" in name: with GatheredParameters(param): regularization =...

Nan Zhang

Customized loss value

Checking the pruned but uncompressed model

Using llama-cpp-python to run quantized R1

Total parameters are less after quantization

Total parameters are less after quantization