zengxianfeng

Results 36 comments of zengxianfeng

@nreimers Emmm.. I set the initializer of U, b_start, b_end and initial state in the viterbi_decode to zeros,but it doesn't work.Maybe post-processing is the only way. But I am still...

> I recommend this can hence be safely closed. I encountered a similar issue where the results of MixedFusedRMSNorm and LLAMA's RMSNorm are inconsistent when applied to the same tensor....

> > I recommend this can hence be safely closed. > > I encountered a similar issue where the results of MixedFusedRMSNorm and LLAMA's RMSNorm are inconsistent when applied to...

I retrained my model, but the problem is still here.

Token batching is a necessary feature for some tasks like machine translation as it is a recognized setting in the field. When you want to make sure that your experimental...

Got stuck when compiling the fused_kernels when training on multiple nodes. But it works well in a single node. Why?

> Hi @SefaZeng, you can use the torch.distributed launcher w. DeepSpeed-enabled code. Would that help your issue here? Thanks for your reply!! Are there any script examples? Like how to...

> There is not. I recommend you shard your dataset like the Pile (https://the-eye.eu/public/AI/pile/train/) and then unpack and train on one shard at a time. Thank you for your reply....

> Depends on what do you mean by "extract raw text". Extract from what? I mean how to extract the thai contents from the xml files which is downloaded from...

> Hi @SefaZeng You should consider merging the lora layers and run the merged model as a standalone `transformers` model > > ```python > model = model.merge_and_unload() > ``` >...