How to run on A100 40G?
When I run ./test_compute ../config_all/llama3-8B/1024.json directly, I got "Got bad cuda status: out of memory at line: 27/root/Nanoflow/pipeline/src/vortexData.cu". Change the config model_configs.allocate_kv_data_batch to 100, I got Segmentation fault (core dumped). Then I change pipeline_configs to smaller, got Segmentation fault (core dumped) too.
I want to know if there are some rules on how to config it when using different kind of GPUs?
the same question...
Got bad cuda status: out of memory at line: 27/ai/zhiyi/w/multimodal/openbmb/Nanoflow/pipeline/src/vortexData.cu 4090 24G报同样的错误
We have upgraded our codebase from c++ to python, now the configuration would be more clear so that you can clearly define the KV cache capacity in your own code. Moreover, one thing need to be concerned is that A100 use SM80 but the Horpper uses SM 90 instead, so that you should change the SM setting in the cmakefile to compile correctly.