guidance Intended usage of Huggingface models with guidance?

Hey there,

I am trying to get a minimal example to run with Huggingface models over constrained generation:

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

from guidance import  gen, select, models
hfm = models.transformers.Transformers(model, tokenizer) #This already throws multiple warnings
lm = hfm + "this is a prompt" + gen(max_tokens=10) #This throws errors for arbitrary Huggingface models

This errors for arbitrary Hugginface models. What am I missing here?

I could not find a properly guiding example, nor issue nor discussion on this. Thus I made this an issue. Please forward in case I missed something.

Mar 03 '25 16:03 THinnerichs

I have the same issue!

Mar 12 '25 15:03 rutgerklaassen

@THinnerichs this is indeed the right way to instantiate a transformers model if you want to construct the tokenizer and model directly, although you can as well do hfm = models.transformers.Transformers("HuggingFaceTB/SmolLM-135M-Instruct") and we'll do all of the from_pretrained work for you.

That being said, I am seeing an error with this particular model / tokenizer -- linking an issue here: https://github.com/guidance-ai/llguidance/issues/138

In the meantime, would you give it a try with a model like microsoft/Phi-4-mini-instruct?

Either way, you'll likely still get some warnings when constructing the model about the chat template -- we still have some work to do to make sure we're doing some inference about chat templates correctly.

Mar 14 '25 20:03 hudson-ai

Thank you very much!

I wanted to use this HF model as it is small enough to fit into my local laptop memory.

Phi-4-mini works but is too big and slow to fit my needs. I couldn't find any entries on quantization, etc. Do you have ideas on how to either 1. use this model more efficiently (for inference only) or 2. have recommendations on smaller models to use instead?

Mar 18 '25 10:03 THinnerichs

Do you have ideas on how to either 1. use this model more efficiently (for inference only) or 2. have recommendations on smaller models to use instead?

There are some relevant notes in https://github.com/guidance-ai/guidance/discussions/1176#discussioncomment-12754236 on making models run faster, though they don't always have the impact you'd expect (at least not for me).

The summary is to try installing accelerate and then add these two parameters in your call to models.Transformers:

model = models.Transformers(
    ...
    torch_dtype="auto",
    device_map="auto",
)

Apr 15 '25 13:04 nchammas