exllama
exllama copied to clipboard
openllama support
Hi, really nice work here! I really appreciate it that you bring llama inference to consumer grade GPUs!! There is an ongoing project https://github.com/openlm-research/open_llama which seems to have a lot potentials. Do you think this will be supported in the future? Thanks!
Afaik should be supported natively, have you tried? the underlying architecture is the same as llama models
Okey good to know @nikshepsvn , I will try that tmrw. Will update here!
I've always assumed as much but just decided I'd look into it when they release a 33B model. I'm an elitist.
Confirming that open_llama_13b, quantized with GPTQ for Llama to 4 bits/32g, works well .