zengxianfeng comments

Results 36 comments of


                                            zengxianfeng

Wrong transition in crf when doing a sequence labeling task

@nreimers Emmm.. I set the initializer of U, b_start, b_end and initial state in the viterbi_decode to zeros,but it doesn't work.Maybe post-processing is the only way. But I am still...

Questions about numeric precision of FusedRMSNorm

> I recommend this can hence be safely closed. I encountered a similar issue where the results of MixedFusedRMSNorm and LLAMA's RMSNorm are inconsistent when applied to the same tensor....

Questions about numeric precision of FusedRMSNorm

> > I recommend this can hence be safely closed. > > I encountered a similar issue where the results of MixedFusedRMSNorm and LLAMA's RMSNorm are inconsistent when applied to...

model.load_adapter Problem

I retrained my model, but the problem is still here.

Token batching

Token batching is a necessary feature for some tasks like machine translation as it is a recognized setting in the field. When you want to make sure that your experimental...

Fused kernel compilation could get stuck

Got stuck when compiling the fused_kernels when training on multiple nodes. But it works well in a single node. Why?

[REQUEST]How to deploy multi-nodes training without hostfile?

> Hi @SefaZeng, you can use the torch.distributed launcher w. DeepSpeed-enabled code. Would that help your issue here? Thanks for your reply!! Are there any script examples? Like how to...

OOM error when training on a 220G Memory machine with 8 V100.

> There is not. I recommend you shard your dataset like the Pile (https://the-eye.eu/public/AI/pile/train/) and then unpack and train on one shard at a time. Thank you for your reply....

Is there some scripts to extract the raw text?

> Depends on what do you mean by "extract raw text". Extract from what? I mean how to extract the thai contents from the xml files which is downloaded from...

Peft is much slower than the origin models when doing inference.

> Hi @SefaZeng You should consider merging the lora layers and run the merged model as a standalone `transformers` model > > ```python > model = model.merge_and_unload() > ``` >...