vllm
vllm copied to clipboard
[New Model]: OpenELM-3B
The model to consider.
The closest model vllm already supports.
No response
What's your difficulty of supporting the model you want?
OpenELM models have a dynamic head num for each layer, which needs a dynamic kv_cache for page attention:
"num_kv_heads": [
3,
3,
3,
3,
3,
4,
4,
4,
4,
4,
4,
4,
5,
5,
5,
5
],
"num_query_heads": [
12,
12,
12,
12,
12,
16,
16,
16,
16,
16,
16,
16,
20,
20,
20,
20
],
"num_transformer_layers": 16,
I had made this feature request here . https://github.com/vllm-project/vllm/discussions/4350
This issue was closed and marked as completed but I don't see it referencing a PR nor do I see the model listed as a supported model?