outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Specifying set with at least 1 element

Open MilesCranmer opened this issue 9 months ago • 4 comments

Hi all,

Thanks for the great library. I'm trying to categorize publications into ML/Astro/Physics. Just curious what the right way to do this is.

I am using the following schema:

import outlines
from pydantic import BaseModel, field_validator
from enum import Enum

class Category(str, Enum):
    ML = "ML"
    Astro = "Astro" 
    Physics = "Physics"

class PublicationCategories(BaseModel):
    categories: set[Category]

    @field_validator('categories')
    @classmethod
    def ensure_foobar(cls, v):
        if len(v) == 0:
            raise ValueError('No categories found')
        return v

model = outlines.models.transformers(
    "HuggingFaceTB/SmolLM-135M-Instruct",
    device="auto",
)
category_extractor = outlines.generate.json(model, PublicationCategories)

The validator hits this error when the LLM tries to return no outputs:

  Value error, No categories found [type=value_error, input_value=[], input_type=list]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

So I'm just wondering the proper way to do this. Thanks!

So I am just wondering how I can specify a field that is a set with a minimum of 1 element?

MilesCranmer avatar Mar 23 '25 21:03 MilesCranmer

There's no straightforward way I can think of in the current state of the library. An ugly option that would work would be to have an Enum or a regex with all possible combinations. In Outlines v1 you can directly provide an Enum or a Literal as an output type by the way. Although still not ideal, the feature proposed in #1585 could be useful in your case.

RobinPicard avatar Jun 23 '25 13:06 RobinPicard

Wouldn't list[Category] work here?

rlouf avatar Jun 23 '25 13:06 rlouf

The issue is that a value could be repeated several times. There's also this bug related to lists of enums #1630

RobinPicard avatar Jun 23 '25 14:06 RobinPicard

This cannot be done unless you impose constraints dynamically (you need to know what was generated)

rlouf avatar Jun 23 '25 16:06 rlouf