mans comments

Results 17 comments of


                                            mans

can you release the code for face recognition with arcface loss?

> Hi, > > I apologise for the delay. When I get some spare time after some current research projects I will upload this code. looking foward to the code...

[Question] 请问7B没有用上FlashAttention吗？

No. We user xformers for training, and naive impl for inference.

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Can you provide Code snippets to reproduce ?

对于特定输入baichuan2-7b-base模型会输出为空

Can you provider the corresponding stentence of input ?

Can not add new tokens.

You need to manipulate the NormHead.weight. according > https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.resize_token_embeddings You just should init a new parameter with desired shape, let's say "new_param". new_param[: origin_vocab_size] = normhead.weight. replace the normhead weight...

Can not add new tokens.

@shyoulala @snowlixue @Ignoramus0817 refer to https://github.com/baichuan-inc/Baichuan2/issues/155

z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

z-loss was adopted in our training. But it is not necessary so we turn it off in the opensource code.

z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

> > z-loss was adopted in our training. But it is necessary so we turn it off in the opensource code. > > hi @mmmans , do you mean it's...

z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

> @mmmans I have added thousands of new tokens and made finetuning of full parameters. Do I need to set z_loss_weight? depends on your own setting actually. if your training...

对于长文本的输入，baichuan2-13b输出非常慢

How long is your text?