guidance
guidance copied to clipboard
Bitsandbytes
Can there be a way to load a transformer from huggingface in bits and bytes. That could make the model loading easier. I might add this after work but it would be nice to have.
I just saw that pull request #8 does that
Building your own loader looks pretty simple (someone tell me i'm wrong please):
class AutoLLM(guidance.llms.Transformers):
cache = guidance.llms.LLM._open_cache("_auto.diskcache")
def __init__(self, model, tokenizer=None, device_map=None, **kwargs):
""" Create a new auto model.
"""
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained(
model,
device_map="auto",
load_in_8bit=True,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
model,
device_map="auto",
load_in_8bit=True,
)
super().__init__(model, tokenizer=tokenizer, device_map=device_map, **kwargs)
llm = AutoLLM(model=...)
prompt = guidance(...)
prompt(llm=llm)
Oh it's much easier just pass in an already loaded tokenizer and model into guidance.llms.Transformers: https://github.com/microsoft/guidance/blob/main/guidance/llms/_transformers.py#L17
Here is my code
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, device_map='auto', load_in_8bit=True)
llm = guidance.llms.Transformers(model, tokenizer)
I am going to close this
I think it would be good if this were documented better. Perhaps you can add an example to the home page (say, using 8bit bytes and auto placement) and improve the class documentation.