vllm icon indicating copy to clipboard operation
vllm copied to clipboard

when to support chatglm2-6b?

Open liukaiyueyuo opened this issue 1 year ago • 2 comments

chatglm-6b(chatglm2-6b) is a very popular Chinese LLM. Do you have a plan?

liukaiyueyuo avatar Jun 26 '23 02:06 liukaiyueyuo

mark

zhaoying9105 avatar Jul 02 '23 13:07 zhaoying9105

+1, wanted too

bb2103 avatar Jul 04 '23 01:07 bb2103

mark

foxxxx001 avatar Jul 06 '23 16:07 foxxxx001

mark

irasin avatar Jul 11 '23 12:07 irasin

mark

zhuqn avatar Jul 12 '23 07:07 zhuqn

+1

BrightXiaoHan avatar Jul 14 '23 06:07 BrightXiaoHan

+1

wuxiy avatar Jul 14 '23 07:07 wuxiy

+1

wuxiy avatar Jul 15 '23 10:07 wuxiy

Mark

binarrii avatar Jul 15 '23 11:07 binarrii

mark

Oliver-ss avatar Jul 18 '23 03:07 Oliver-ss

mark

felixstander avatar Jul 18 '23 08:07 felixstander

mark

akxxsb avatar Jul 21 '23 03:07 akxxsb

+1

jinchihe avatar Jul 21 '23 10:07 jinchihe

mark

wang-benqiang avatar Jul 22 '23 00:07 wang-benqiang

+1

iDonal avatar Jul 28 '23 10:07 iDonal

+1

youthbupt avatar Jul 30 '23 03:07 youthbupt

If anyone is familiar with chatGLM model architecture, feel free to help on #625. I am new to transformer architecture and not sure if my changes is correct..

Jeffwan avatar Jul 31 '23 18:07 Jeffwan

+1

jinghai avatar Aug 24 '23 14:08 jinghai

mark

x22x22 avatar Aug 28 '23 06:08 x22x22

mark

datalee avatar Sep 04 '23 00:09 datalee

mark

callanwu avatar Oct 19 '23 03:10 callanwu

If anyone have bandwidth to help us implement ChatGLM support, please leave a comment and coordinate here: https://github.com/vllm-project/vllm/issues/1552

simon-mo avatar Nov 02 '23 18:11 simon-mo

When I use vllm to load ChatGLM2 model that trained with "quantization_bit 8", it seems to be not supported. The code below is the original code in ChatGLM2 "modeling_chatglm.py". If anyone can add this feature into vllm? thanks.

` class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel): def init(self, config: ChatGLMConfig, empty_init=True, device=None): super().init(config)

    self.max_sequence_length = config.max_length
    self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device)
    self.config = config
    self.quantized = False

    if self.config.quantization_bit:
        self.quantize(self.config.quantization_bit, empty_init=True)

`

`
def quantize(self, bits: int, empty_init=False, device=None, **kwargs): if bits == 0: return

    from .quantization import quantize

    if self.quantized:
        logger.info("Already quantized.")
        return self

    self.quantized = True

    self.config.quantization_bit = bits

    self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
                                        **kwargs)
    return self

`

chenyangjun45 avatar Dec 05 '23 03:12 chenyangjun45

ChatGLM supported in https://github.com/vllm-project/vllm/pull/1261

hmellor avatar Mar 06 '24 16:03 hmellor