LaaZa

Results 113 comments of LaaZa

@Qubitium yes, grouping them similar to mixtral should speed up significantly.

> @Qubitium does `router.layer` must be included? Because mixtral did not include it (mixtral name it `gate`). And should normalization parameter be included? I think they shouldn't even if fixing...

> > I think they shouldn't even if fixing padding allows for it. Transformers/Optimum would likely need to ignore both. Quantizing normalization is generally not done anyway. > > What...

Take a look at #439 if that helps. It's going to be slow though as it happens on the cpu.

Here is updated fork, but transformers cache has changed and is broken. https://github.com/LaaZa/AutoGPTQ/tree/support-group-query-attention I'm just not going to work on this for now at least. I feel it's not worth...

Make sure you have proper dataset that is similar to the training dataset.

> @LaaZa Hello! Have you resolved this issue? I'm trying to follow the quick tour to get a 4-bit model from `internlmxcomposer2`, but I encountered a TypeError stating that 'internlmxcomposer2...

> @LaaZa I can edit the model_type to load the model, but I'm not sure how to determine if the model is quantized. I also tried the `bitsandbytes` package, and...

`use_marlin=True` should be used in the `from_quantized` when loading the quantized marlin model. When quantizing, put `is_marlin_format=True` in the quantize_config instead to quantize with marlin.

You need to also add the model type into a few places. Look at this as an example: https://github.com/AutoGPTQ/AutoGPTQ/pull/481/files