aphrodite-engine icon indicating copy to clipboard operation
aphrodite-engine copied to clipboard

[Bug]: Can't seem to disable enforcement of Eager mode.

Open prolix-oc opened this issue 11 months ago • 1 comments

Your current environment

The output of `python env.py` ```text Currently using uv instead of conda for various personal reasons. DESPITE THIS, Aphrodite works perfectly fine. This error is not related to my choice of venv manager. ```

Model Input Dumps

No response

🐛 Describe the bug

Running Aphrodite via the module through shell like such:

aphrodite run /media/llm/Theia-21B-v2 --served-model-name="bot" --max-model-len 32768 --quantization fp4 --enforce-eager=false --quant-llm-exp-bits 2 --enable-prefix-caching --context-shift --gpu-memory-utilization 0.94 --enable-chunked-prefill=true --api-keys="sk-inf-ladidadida-nokeyforyou"

results in Enforce Eager mode being "True", despite the opposite being set. I am capable of fitting this model into my 3090 using the quantization options, but I am unable to ultimately disable eager mode enforcement. I can't imagine that Aphrodite has any precognition of the memory conditions prior to profiling to justify enabling it.

Are there any ways I can accomplish actually disabling this feature to take advantage of the async output processing? Or is this never going to work on a single GPU? It's not even a documented feature at this point, that I can tell.

prolix-oc avatar Jan 14 '25 11:01 prolix-oc

fpX quants currently enforce eager mode, due to a bug in their kernels. This will be addressed soon, but maybe we should log this behaviour. Thanks for reporting.

AlpinDale avatar Jan 23 '25 07:01 AlpinDale