Guoli Yin
Guoli Yin
@byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is easier...
> > @byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and...
candidate change: ```python class ALiBi(Module): @staticmethod def create_alibi_matrix( q_sequence_length: int, k_sequence_length: int, num_heads: int, offset: int, dtype=mx.float32, ): x1 = mx.arange(offset, q_sequence_length) x2 = mx.arange(0, k_sequence_length) distance_matrix = -mx.abs( mx.expand_dims(x1[:,...
+1. I think both bloom and mpt just need alibi implementation and then vllm could support both.
@WoosukKwon can I ask about the timeline for it when you mention it very soon? is it like in 2 weeks or 4 weeks roadmap? thanks
> Hi Guoli, what's the use case? Should we first discuss in an internal PR? sg. let's discuss in an internal PR firstly.
cc @dongyin92
@chaunceyjiang thanks for adding this! may I ask whether this change is also compatible with MultiModalHasher? https://github.com/vllm-project/vllm/blob/084bbac8cc4c29b7dcd2098418168c61d3d42e9b/vllm/multimodal/hasher.py#L24 when we enable prefix caching, the image_embeds shall be hashable as well?
@DarkLight1337 thanks for sharing. from the code, it looks like it will create a hash key from mm_data? and it will include the type of image_embeds as well if I...