quanto Quanto scale values seem unpopulated in quantized model

Quanto scale values seem unpopulated in quantized model

Open raunaks13 opened this issue 10 months ago • 2 comments

When loading a mistral model I noticed that the output_scale and input_scale values associated with the quantized tensors were just tensors with the value 1, i.e. tensor(1., device='cuda:0') This seems incorrect, since the model seems to be quantized correctly, and I would expect these variables to have the scaling factor that was used? Is there a reason for this behavior? Here is the code I used:

seed = 1
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
device = None
weights_type = "int8"
activations_type = "none"
torch.manual_seed(seed)

if device is None:
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print("Using cuda device")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
    else:
        device = torch.device("cpu")
else:
    device = torch.device(device)
    
weights = keyword_to_qtype(weights_type)
activations = keyword_to_qtype(activations_type)
dtype = torch.float32 if device.type == "cpu" else torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = "left"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, low_cpu_mem_usage=True).to(device)

if weights is not None or activations is not None:
    print("Quantizing")
    start = time.time()
    quantize(model, weights=weights, activations=activations)
    # freeze(model)
    print(f"Finished: {time.time()-start:.2f}")

Apr 08 '24 02:04 raunaks13

The input and output scales are only used when activations are not None. They are set by default to 1.0, and can only be updated by going through a calibration phase. See https://github.com/huggingface/quanto?tab=readme-ov-file#quantization-workflow

Apr 08 '24 06:04 dacorvo

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 12 '24 01:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

May 18 '24 01:05 github-actions[bot]

quanto quanto copied to clipboard

Quanto scale values seem unpopulated in quantized model

quanto
quanto copied to clipboard