PowerInfer icon indicating copy to clipboard operation
PowerInfer copied to clipboard

fix a bug when calculating `neuron_cap` before invoking the solver

Open KiritoHugh opened this issue 1 year ago • 0 comments

For example, ReluLLaMA-7B; NVIDIA GeForce RTX 2080 Ti 11264MiB; ffn_up,ffn_gate,ffn_down_t all are[4096,11008]; A neuron should be [4096,1] not [1,11008].

when env CUDA_VISIBLE_DEVICES=0 ./build/bin/main -m ./ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" :

  • before revising: slice_size=22016 vram_bytes_per_slice=99072 vram_allocatable_bytes=4212178944 neuron_cap=170064

  • after revising: slice_size=8192 vram_bytes_per_slice=24576 vram_allocatable_bytes=4212178944 neuron_cap=171394

KiritoHugh avatar Dec 10 '24 08:12 KiritoHugh