guidance Bitsandbytes

Can there be a way to load a transformer from huggingface in bits and bytes. That could make the model loading easier. I might add this after work but it would be nice to have.

May 17 '23 17:05 grantCelley

I just saw that pull request #8 does that

May 17 '23 18:05 grantCelley

Building your own loader looks pretty simple (someone tell me i'm wrong please):

class AutoLLM(guidance.llms.Transformers):
    cache = guidance.llms.LLM._open_cache("_auto.diskcache")

    def __init__(self, model, tokenizer=None, device_map=None, **kwargs):
        """ Create a new auto model.
        """
        import transformers
        tokenizer = transformers.AutoTokenizer.from_pretrained(
            model,
            device_map="auto",
            load_in_8bit=True,
        )
        model = transformers.AutoModelForCausalLM.from_pretrained(
            model,
            device_map="auto",
            load_in_8bit=True,
        )

        super().__init__(model, tokenizer=tokenizer, device_map=device_map, **kwargs)
        
 llm = AutoLLM(model=...)
 prompt = guidance(...)
 prompt(llm=llm)

May 17 '23 18:05 sheenobu

Oh it's much easier just pass in an already loaded tokenizer and model into guidance.llms.Transformers: https://github.com/microsoft/guidance/blob/main/guidance/llms/_transformers.py#L17

May 17 '23 19:05 sheenobu

Here is my code

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, device_map='auto', load_in_8bit=True)

llm = guidance.llms.Transformers(model, tokenizer)

I am going to close this

May 19 '23 03:05 grantCelley

I think it would be good if this were documented better. Perhaps you can add an example to the home page (say, using 8bit bytes and auto placement) and improve the class documentation.

May 23 '23 17:05 tmbdev

guidance guidance copied to clipboard

Bitsandbytes

guidance
guidance copied to clipboard