gpt-fast fixing GPTQ

fixing GPTQ

Open HDCharles opened this issue 10 months ago • 0 comments

Stack from ghstack (oldest at bottom):

-> #148

Summary:

trying to fix the issue with kv_cache update by changing tracing into a tensor subclass. However it seems we have less success than the fx tracer. The fx tracer breaks due

k_out[:,:, input_pos] = k_val

getting traced as

new_var = torch.ops.aten.index_put_(k_out, [None, None, input_pos], k_val)

with new var never being accessed afterward. new_var becomes hte correct multiInput value, but then is lost.

The subclass ont he other hand, tries to use the func "<slot wrapper 'setitem' of 'torch._C.TensorBase' objects>" which seems to not want to mutate k_out and so the attempt to make it a multiTensor fails.

Test Plan: sh run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Mar 28 '24 08:03 HDCharles

gpt-fast gpt-fast copied to clipboard

fixing GPTQ

gpt-fast
gpt-fast copied to clipboard