mrgaolei comments

Results 13 comments of


                                            mrgaolei

everything is OK, but 0 games found

Size of my SDCard is 1GB. I format it using FAT32 via Mac's terminal: ``` diskutil earseVolume FAT32 name /Volumes/SDCARD ``` But I will buy a bigger SDCard, are you...

everything is OK, but 0 games found

When should I insert Nintendo Super Star Brow DISC? I haven't this DISC yet, is game list require this DISC?

everything is OK, but 0 games found

Got it, I change a SD card, every thing goes right.

[Bug] Why DUAL-CPU has no speed up?

> Is it possible that the bottleneck is the GPU now? 3090 is quite close to V100 in term of Flops. Just a guess. However, aren't ktransformers entirely computed by...

[Bug] RuntimeError: CUDA error: invalid resource handle

And this is `rpc.log` ``` [2025-08-12 17:21:55.161] [info] [scheduler.cpp:31] Number of available GPUs: 1, want 1 [2025-08-12 17:21:55.161] [info] [scheduler.cpp:66] Each GPU Total: 2196MiB, Model Params: 0MiB, KVCache: 2196MiB, Left:...

[Bug] RuntimeError: CUDA error: invalid resource handle

再次补充：git tag切换到v0.3.2，用非blance_serve启动，就没问题，用blance_serve启动会提示这个。如果git pull到HEAD版本，则都不行。但是我看好像HEAD版本已经默认是blance_serve引擎了。难道说，blance_serve要求必须512G内存？我这台机器唯一的区别就是把512G内存降级成了256，但是我跑的Q2模型，理论也够的，因为kt默认引擎都可以启动。

[Bug] Why AMX speed down?

> 首先，AMX只加速prefill过程其次，相同配置下，BF16的生成速度要是比Q4快那你可以领图灵奖了那么使用Q4模型，哪怕是prefill过程也不会加速是吗？

kt-kernel 0.4.2 Kimi-K2 Thinking: torch.OutOfMemoryError: CUDA out of memory.

> The issue is that you used `--kt-num-gpu-experts 16`, which specifies that each layer has 16 experts on the GPU. The 24GB VRAM can’t handle that, so try lowering this...

kt-kernel 0.4.2 Kimi-K2 Thinking: torch.OutOfMemoryError: CUDA out of memory.

> I see. The native Kimi-K2-Thinking model uses BF16-precision (non-expert) weights on the GPU side, so it consumes more VRAM than DeepSeek-V3/R1. A 24 GB GPU isn’t sufficient. You may...

模型大小140G，设备内存256G，不开启 NUMA，内存占用为什么只有4G左右？

```bash free -h ``` 看buffer/cache，主要内存占用在这里