Wang hl comments

Results 7 comments of


                                            Wang hl

How to finetune RWKV?

> https://github.com/BlinkDL/RWKV-v2-RNN-Pile What kind of finetuning methods does this use? I think it tunes all parameters in the model?

How to finetune RWKV?

I get a wonderful solution about this problem. Since the latest version of transformers support RWKV, I can now use peft to finetune RWKV. Here is the demo code: ```...

How to finetune RWKV?

> assume that I have training data - json or tsv - in the format {"instruction": THE INSTRUCTION", input:"THE INPUT", output:"DESIRED OUTPUT"} how can I modify your peft code to...

Understanding why TorchInductor cannot speed-up huggingface transformer inference

I think HF llama does not have a static kv cache, since its cache is dynamically increased during generation. Here is the relavent code: https://github.com/huggingface/transformers/blob/38611086d293ea4a5809bcd7fadd8081d55cb74e/src/transformers/models/llama/modeling_llama.py#L1014C37-L1014C37 However, I also have the...

Wang hl

How to finetune RWKV?

How to finetune RWKV?

How to finetune RWKV?

Understanding why TorchInductor cannot speed-up huggingface transformer inference

[Feature Request]: Oneflow.distributions

[Feature Request]: Oneflow.distributions

请问下代码里的kl散度问题