gpt-fast
gpt-fast copied to clipboard
fixing GPTQ
Stack from ghstack (oldest at bottom):
- -> #148
Summary:
trying to fix the issue with kv_cache update by changing tracing into a tensor subclass. However it seems we have less success than the fx tracer. The fx tracer breaks due
k_out[:,:, input_pos] = k_val
getting traced as
new_var = torch.ops.aten.index_put_(k_out, [None, None, input_pos], k_val)
with new var never being accessed afterward. new_var becomes hte correct multiInput value, but then is lost.
The subclass ont he other hand, tries to use the func "<slot wrapper 'setitem' of 'torch._C.TensorBase' objects>" which seems to not want to mutate k_out and so the attempt to make it a multiTensor fails.
Test Plan: sh run.sh
Reviewers:
Subscribers:
Tasks:
Tags: