Albert Tseng comments

Results 15 comments of


                                            Albert Tseng

support 2bit quip# method

We are still working on integration, albeit very slowly.

Global finetuning?

Cool, good to hear that our fine-tuning works for AQLM too. I also observed that the e2e fine-tuning can do most of what the blockwise fine-tuning does, which is good...

support 2bit quip# method

We have a better method coming out soon so quip# development has been superceded. We may eventually get around to hf support but without working cuda graphs during general its...

support 2bit quip# method

Hi Marc, Cuda graphs are essential for fast inference since they mask out much of the kernel launch overheads. Many quantization algorithms like QuIP# use multiple kernels during inference and...

support 2bit quip# method

Is there a list of such models and a guide on how to use cuda graphs with transformers? I just tried torch.compile(model.generate, mode=’reduce-overhead’) on transformers 4.42.3 with Llama 2 7B...