[Bug]: Can't seem to disable enforcement of Eager mode.
Your current environment
The output of `python env.py`
```text Currently using uv instead of conda for various personal reasons. DESPITE THIS, Aphrodite works perfectly fine. This error is not related to my choice of venv manager. ```Model Input Dumps
No response
🐛 Describe the bug
Running Aphrodite via the module through shell like such:
aphrodite run /media/llm/Theia-21B-v2 --served-model-name="bot" --max-model-len 32768 --quantization fp4 --enforce-eager=false --quant-llm-exp-bits 2 --enable-prefix-caching --context-shift --gpu-memory-utilization 0.94 --enable-chunked-prefill=true --api-keys="sk-inf-ladidadida-nokeyforyou"
results in Enforce Eager mode being "True", despite the opposite being set. I am capable of fitting this model into my 3090 using the quantization options, but I am unable to ultimately disable eager mode enforcement. I can't imagine that Aphrodite has any precognition of the memory conditions prior to profiling to justify enabling it.
Are there any ways I can accomplish actually disabling this feature to take advantage of the async output processing? Or is this never going to work on a single GPU? It's not even a documented feature at this point, that I can tell.
fpX quants currently enforce eager mode, due to a bug in their kernels. This will be addressed soon, but maybe we should log this behaviour. Thanks for reporting.