Nan Zhang

Results 5 issues of Nan Zhang

full parameter update里面,我最近在试一种新的loss function,就是在原有的next token prediction上面加一个regularized term,希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error: 比如在lomo_trainer.py中: ```python lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32) self.model.train() for name, param in self.model.named_parameters(): if "self_attn.q_proj" in name: with GatheredParameters(param): regularization =...

Hi, Thanks a lot for this awesome work! I am wondering whether there is a way to check the pruned but uncompressed model. Now when I save the model, they...

Thanks for your great blog post on running [DeepSeek R1 Dynamic 1.58-bit](https://unsloth.ai/blog/deepseekr1-dynamic#running%20r1)! I notice that you used llama.cpp. However, if we need to run a bunch of different prompts and...

After quantization of LLaMA2-7b, I notice that total parameters of the quantized model is around 1.1B while the original dense model has around 6.7B parameters. It seems that the code...

After quantization of LLaMA2-7b, I notice that total parameters of the quantized model are much less than the original dense model. For example, running your code or using the quantized...