PENG Bo

Results 265 comments of PENG Bo

> @BlinkDL You said earlier `ChatRWKV v2: with "stream" and "split" strategies. 3G VRAM is enough to run RWKV 14B` Yet @oobabooga said it went OOM on a 3090 (24GB...

> If it replies faster/better than a regular 13b even with the split, it's still something. Plus the faster time to train. But I guess miracles we will not get....

> @BlinkDL but its 24gb not 3gb, i really wanned to run that on 3080ti which only has 12gbs "cuda fp16 *12 -> cpu fp32" [try increasing 12, for better...

Hi :) As I said before, [try increasing 30, for better speed, until you run out of VRAM]. @Ph0rk0z Increase "30" in cuda fp16 *30 to compute more layers on...

Moreover, set os.environ["RWKV_CUDA_ON"] = '1' in https://github.com/oobabooga/text-generation-webui/blob/main/modules/RWKV.py for 10x speedup of reply time

It's purely pytorch issue because the CPU utilization is fine for most Intel CPUs and AMD server CPUs. I will ask pytorch guys. Please see whether "cuda fp16 *29+" will...

> All in all this model handles 4096 context good enough. Maybe the limits should be raised. RWKV-ctx4096 models can handle ctx4k :) The difference between cuda and non-cuda is...

Great work :) My idea is to keep the main ChatRWKV repository simple (easy for everyone to learn its code), while having some community forks with cutting-edge functions. If you...

> Nothing offensive, but honestly I found the `global` variables are abused in the current code base. It may be a challenge for new comers to read and understand the...

Now ChatRWKV v2 supports this too :)