LaaZa
LaaZa
> May I ask under what circumstances would the quantization process suddenly exit the program without any error prompts I'm not sure but you might be running out of memory....
I think this is also what happened with a sharded model that was made with the PR. #364
Without modifying AutoGPTQ code you can try this: ```python import torch, auto_gptq from transformers import AutoModel, AutoTokenizer from auto_gptq.modeling._base import BaseGPTQForCausalLM import logging logging.basicConfig( format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d...
Okay, I can't see the module shapes because the model is not in safetensors format. Try to update to the very latest auto_gptq from git, it should have a fix...
Do you have commit https://github.com/AutoGPTQ/AutoGPTQ/commit/b4b801c6d37cbd210a2f36579fc09d2915b72f22
Oh, I think it only affected outfeatures. Some of the modules do not have infeatures divisible by 32. fc and fc2 it seems like, but that's pretty bad because they...
You are now quantizing only a very small portion of the model making it almost pointless. The model seems to have a lot of issues especially with quantization https://huggingface.co/haouarin/jais-13b-chat-GPTQ-4bits vllm...
> Hi @LaaZa > I have started the quantization based on this PR. It's at 27/40 now. Shall I also test https://github.com/AutoGPTQ/AutoGPTQ/pull/625 or this PR will be have the lats...
I think this is ready for retesting with transformers 4.40.0
My review isn't going to help here. I can't merge and I don't have the memory to even try this. Actually I'm not even quite sure where the reference model...