aphrodite-engine
aphrodite-engine copied to clipboard
Add Olmo2
Ported from vLLM
Can you add it to tests/weight_loading/models.txt too? Thanks
Is there a way to add something to it without quantization? All the current ones in there have some random quant attached to them
Looks like it needs transformers>=4.47.0, am I good to bump the version in the PR?
Other than a TF mismatch when installing Aphrodite, seems to work fine
Running this PR with the latest main branch merged gives this error (tensor_parallel_size=2):
File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 156, in forward
q, k = self._apply_qk_norm(q, k)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 138, in _apply_qk_norm
q = self.q_norm.forward_native(q)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/layers/layernorm.py", line 65, in forward_native
x = x.to(orig_dtype) * self.weight
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: The size of tensor a (2732) must match the size of tensor b (4096) at non-singleton dimension 1
Works fine with 1 GPU.