aphrodite-engine Add Olmo2

Ported from vLLM

Dec 22 '24 15:12 fizzAI

Can you add it to tests/weight_loading/models.txt too? Thanks

Dec 22 '24 16:12 AlpinDale

Is there a way to add something to it without quantization? All the current ones in there have some random quant attached to them

Dec 22 '24 16:12 fizzAI

Looks like it needs transformers>=4.47.0, am I good to bump the version in the PR?

Dec 22 '24 18:12 fizzAI

Other than a TF mismatch when installing Aphrodite, seems to work fine

Dec 22 '24 19:12 fizzAI

Running this PR with the latest main branch merged gives this error (tensor_parallel_size=2):

  File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 156, in forward
    q, k = self._apply_qk_norm(q, k)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/models/olmo2.py", line 138, in _apply_qk_norm
    q = self.q_norm.forward_native(q)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/aphrodite-engine/aphrodite/modeling/layers/layernorm.py", line 65, in forward_native
    x = x.to(orig_dtype) * self.weight
        ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: The size of tensor a (2732) must match the size of tensor b (4096) at non-singleton dimension 1

Works fine with 1 GPU.

Jan 08 '25 11:01 AlpinDale

aphrodite-engine aphrodite-engine copied to clipboard

Add Olmo2

aphrodite-engine
aphrodite-engine copied to clipboard