Yu Zhang comments

Results 89 comments of


                                            Yu Zhang

Syntax errors while running script to extract data

@divija96 Hi, could you check your python version? I encountered this error under py27. Please make sure that your environment is python>=3.6.

Syntax errors while running script to extract data

Could you provide me some examples in the processed prop file. I speculate there might be some errors.

更新后的rwkv6，loss会nan

Oh, looks that you may need to switch back to logsigmoid, -exp is not stable yet

更新后的rwkv6，loss会nan

This update fixes potential nans during inference, I think it's not the issue. Possibly cuz of potential inf grad of -exp, would check it, thank you

更新后的rwkv6，loss会nan

You can enable gradient for h0 mannually

更新后的rwkv6，loss会nan

Taking h0 as learnable params would be ok? like `h0 = nn.Parameter(key_dim, head_dim)`

更新后的rwkv6，loss会nan

ic, currently there is no access to grad of states. we will add an option later

@JL-er Hi, check it out https://github.com/sustcsonglin/flash-linear-attention/commit/1547448b998a163fdb33c49266da699db13f2dc8 Now we do not truncate grad of h states for RWKV6 for ease of state tuning Do contact us if you met any bugs...

Using operators without having `transformers` installed

@Ronsor Hi, may I know why you need this. I think it's hard to use `fla` as well if `transformers` is unavailable. Currently this pkg is heavily tied with 🤗...