PENG Bo
PENG Bo
> @BlinkDL You said earlier `ChatRWKV v2: with "stream" and "split" strategies. 3G VRAM is enough to run RWKV 14B` Yet @oobabooga said it went OOM on a 3090 (24GB...
> If it replies faster/better than a regular 13b even with the split, it's still something. Plus the faster time to train. But I guess miracles we will not get....
> @BlinkDL but its 24gb not 3gb, i really wanned to run that on 3080ti which only has 12gbs "cuda fp16 *12 -> cpu fp32" [try increasing 12, for better...
Hi :) As I said before, [try increasing 30, for better speed, until you run out of VRAM]. @Ph0rk0z Increase "30" in cuda fp16 *30 to compute more layers on...
Moreover, set os.environ["RWKV_CUDA_ON"] = '1' in https://github.com/oobabooga/text-generation-webui/blob/main/modules/RWKV.py for 10x speedup of reply time
It's purely pytorch issue because the CPU utilization is fine for most Intel CPUs and AMD server CPUs. I will ask pytorch guys. Please see whether "cuda fp16 *29+" will...
> All in all this model handles 4096 context good enough. Maybe the limits should be raised. RWKV-ctx4096 models can handle ctx4k :) The difference between cuda and non-cuda is...
Great work :) My idea is to keep the main ChatRWKV repository simple (easy for everyone to learn its code), while having some community forks with cutting-edge functions. If you...
> Nothing offensive, but honestly I found the `global` variables are abused in the current code base. It may be a challenge for new comers to read and understand the...
Now ChatRWKV v2 supports this too :)