vllm when to support chatglm2-6b?

chatglm-6b(chatglm2-6b) is a very popular Chinese LLM. Do you have a plan？

Jun 26 '23 02:06 liukaiyueyuo

mark

Jul 02 '23 13:07 zhaoying9105

+1, wanted too

Jul 04 '23 01:07 bb2103

mark

Jul 06 '23 16:07 foxxxx001

mark

Jul 11 '23 12:07 irasin

mark

Jul 12 '23 07:07 zhuqn

+1

Jul 14 '23 06:07 BrightXiaoHan

+1

Jul 14 '23 07:07 wuxiy

+1

Jul 15 '23 10:07 wuxiy

Mark

Jul 15 '23 11:07 binarrii

mark

Jul 18 '23 03:07 Oliver-ss

mark

Jul 18 '23 08:07 felixstander

mark

Jul 21 '23 03:07 akxxsb

+1

Jul 21 '23 10:07 jinchihe

mark

Jul 22 '23 00:07 wang-benqiang

+1

Jul 28 '23 10:07 iDonal

+1

Jul 30 '23 03:07 youthbupt

If anyone is familiar with chatGLM model architecture, feel free to help on #625. I am new to transformer architecture and not sure if my changes is correct..

Jul 31 '23 18:07 Jeffwan

+1

Aug 24 '23 14:08 jinghai

mark

Aug 28 '23 06:08 x22x22

mark

Sep 04 '23 00:09 datalee

mark

Oct 19 '23 03:10 callanwu

If anyone have bandwidth to help us implement ChatGLM support, please leave a comment and coordinate here: https://github.com/vllm-project/vllm/issues/1552

Nov 02 '23 18:11 simon-mo

When I use vllm to load ChatGLM2 model that trained with "quantization_bit 8", it seems to be not supported. The code below is the original code in ChatGLM2 "modeling_chatglm.py". If anyone can add this feature into vllm? thanks.

` class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel): def init(self, config: ChatGLMConfig, empty_init=True, device=None): super().init(config)

    self.max_sequence_length = config.max_length
    self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device)
    self.config = config
    self.quantized = False

    if self.config.quantization_bit:
        self.quantize(self.config.quantization_bit, empty_init=True)

`

`
def quantize(self, bits: int, empty_init=False, device=None, **kwargs): if bits == 0: return

    from .quantization import quantize

    if self.quantized:
        logger.info("Already quantized.")
        return self

    self.quantized = True

    self.config.quantization_bit = bits

    self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
                                        **kwargs)
    return self

`

Dec 05 '23 03:12 chenyangjun45

ChatGLM supported in https://github.com/vllm-project/vllm/pull/1261

Mar 06 '24 16:03 hmellor

vllm vllm copied to clipboard

when to support chatglm2-6b?

vllm
vllm copied to clipboard