Nanoflow icon indicating copy to clipboard operation
Nanoflow copied to clipboard

How to run on A100 40G?

Open TopIdiot opened this issue 1 year ago • 3 comments

When I run ./test_compute ../config_all/llama3-8B/1024.json directly, I got "Got bad cuda status: out of memory at line: 27/root/Nanoflow/pipeline/src/vortexData.cu". Change the config model_configs.allocate_kv_data_batch to 100, I got Segmentation fault (core dumped). Then I change pipeline_configs to smaller, got Segmentation fault (core dumped) too.

I want to know if there are some rules on how to config it when using different kind of GPUs?

TopIdiot avatar Dec 20 '24 03:12 TopIdiot

the same question...

durant1999 avatar Jan 06 '25 09:01 durant1999

Got bad cuda status: out of memory at line: 27/ai/zhiyi/w/multimodal/openbmb/Nanoflow/pipeline/src/vortexData.cu 4090 24G报同样的错误

fangbaolei avatar Jan 22 '25 06:01 fangbaolei

We have upgraded our codebase from c++ to python, now the configuration would be more clear so that you can clearly define the KV cache capacity in your own code. Moreover, one thing need to be concerned is that A100 use SM80 but the Horpper uses SM 90 instead, so that you should change the SM setting in the cmakefile to compile correctly.

Wazrrr avatar Aug 11 '25 20:08 Wazrrr