vllm
vllm copied to clipboard
when to support chatglm2-6b?
chatglm-6b(chatglm2-6b) is a very popular Chinese LLM. Do you have a plan?
mark
+1, wanted too
mark
mark
mark
+1
+1
+1
Mark
mark
mark
mark
+1
mark
+1
+1
If anyone is familiar with chatGLM model architecture, feel free to help on #625. I am new to transformer architecture and not sure if my changes is correct..
+1
mark
mark
mark
If anyone have bandwidth to help us implement ChatGLM support, please leave a comment and coordinate here: https://github.com/vllm-project/vllm/issues/1552
When I use vllm to load ChatGLM2 model that trained with "quantization_bit 8", it seems to be not supported. The code below is the original code in ChatGLM2 "modeling_chatglm.py". If anyone can add this feature into vllm? thanks.
` class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel): def init(self, config: ChatGLMConfig, empty_init=True, device=None): super().init(config)
self.max_sequence_length = config.max_length
self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device)
self.config = config
self.quantized = False
if self.config.quantization_bit:
self.quantize(self.config.quantization_bit, empty_init=True)
`
`
def quantize(self, bits: int, empty_init=False, device=None, **kwargs):
if bits == 0:
return
from .quantization import quantize
if self.quantized:
logger.info("Already quantized.")
return self
self.quantized = True
self.config.quantization_bit = bits
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
**kwargs)
return self
`
ChatGLM supported in https://github.com/vllm-project/vllm/pull/1261