LaaZa comments

Results 113 comments of


                                            LaaZa

[WIP] [WORKING] dbrx (mod) support

@Qubitium yes, grouping them similar to mixtral should speed up significantly.

[WIP] [WORKING] dbrx (mod) support

> @Qubitium does `router.layer` must be included? Because mixtral did not include it (mixtral name it `gate`). And should normalization parameter be included? I think they shouldn't even if fixing...

[WIP] [WORKING] dbrx (mod) support

> > I think they shouldn't even if fixing padding allows for it. Transformers/Optimum would likely need to ignore both. Quantizing normalization is generally not done anyway. > > What...

[WIP] [WORKING] dbrx (mod) support

Take a look at #439 if that helps. It's going to be slow though as it happens on the cpu.

support gqa

Here is updated fork, but transformers cache has changed and is broken. https://github.com/LaaZa/AutoGPTQ/tree/support-group-query-attention I'm just not going to work on this for now at least. I feel it's not worth...

gptq 4bit avg loss is large

Make sure you have proper dataset that is similar to the training dataset.

[FEATURE] Support Internlm-XComposer2-VL Model

> @LaaZa Hello! Have you resolved this issue? I'm trying to follow the quick tour to get a 4-bit model from `internlmxcomposer2`, but I encountered a TypeError stating that 'internlmxcomposer2...

[FEATURE] Support Internlm-XComposer2-VL Model

> @LaaZa I can edit the model_type to load the model, but I'm not sure how to determine if the model is quantized. I also tried the `bitsandbytes` package, and...

[BUG] marlin not support MixtralForCausalLM

`use_marlin=True` should be used in the `from_quantized` when loading the quantized marlin model. When quantizing, put `is_marlin_format=True` in the quantize_config instead to quantize with marlin.

Customise model code not working

You need to also add the model type into a few places. Look at this as an example: https://github.com/AutoGPTQ/AutoGPTQ/pull/481/files