Guoli Yin

Results 12 comments of Guoli Yin

@byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is easier...

> > @byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and...

candidate change: ```python class ALiBi(Module): @staticmethod def create_alibi_matrix( q_sequence_length: int, k_sequence_length: int, num_heads: int, offset: int, dtype=mx.float32, ): x1 = mx.arange(offset, q_sequence_length) x2 = mx.arange(0, k_sequence_length) distance_matrix = -mx.abs( mx.expand_dims(x1[:,...

+1. I think both bloom and mpt just need alibi implementation and then vllm could support both.

@WoosukKwon can I ask about the timeline for it when you mention it very soon? is it like in 2 weeks or 4 weeks roadmap? thanks

> Hi Guoli, what's the use case? Should we first discuss in an internal PR? sg. let's discuss in an internal PR firstly.

@chaunceyjiang thanks for adding this! may I ask whether this change is also compatible with MultiModalHasher? https://github.com/vllm-project/vllm/blob/084bbac8cc4c29b7dcd2098418168c61d3d42e9b/vllm/multimodal/hasher.py#L24 when we enable prefix caching, the image_embeds shall be hashable as well?

@DarkLight1337 thanks for sharing. from the code, it looks like it will create a hash key from mm_data? and it will include the type of image_embeds as well if I...