llm-awq
llm-awq copied to clipboard
Suggest: Add Bayesian optimization support for ratio search
Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?
Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization? You can check my result in https://trotsky1997.notion.site/f49dcb79ab6245a7b689beed086e4c7b?pvs=4
@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.
@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.
that's very easy to modify, just add a new parameter called ratio_b to get_loss function, and replace 1-ratio with ratio_b, than define a new parameter ratio_b with its boundary in parameter definition.
@scheduler.serial
def get_loss(ratio,ratio_b):
nonlocal best_error,best_ratio,best_scales
ratio = ratio * 1 / n_grid
scales = (x_max.pow(ratio) / w_max.pow(ratio_b)
).clamp(min=1e-4).view(-1)
scales = scales / (scales.max() * scales.min()).sqrt()
for fc in linears2scale:
fc.weight.mul_(scales.view(1, -1).to(fc.weight.device))
fc.weight.data = w_quantize_func(
fc.weight.data) / (scales.view(1, -1))
out = block(x, **kwargs)
if isinstance(out, tuple):
out = out[0]
loss = (org_out - out).float().pow(2).mean().item() # float prevents overflow
history.append(loss)
is_best = loss < best_error
if is_best:
best_error = loss
best_ratio = ratio
best_scales = scales
block.load_state_dict(org_sd)
return loss
param_space = dict(ratio=uniform(0, 1),ratio_b=uniform(0, 1))
@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.
I have talked with Dr.Tang, it perform a little better than gs in vicuna, but just the same as gs in llama2-7b.