Specifying set with at least 1 element
Hi all,
Thanks for the great library. I'm trying to categorize publications into ML/Astro/Physics. Just curious what the right way to do this is.
I am using the following schema:
import outlines
from pydantic import BaseModel, field_validator
from enum import Enum
class Category(str, Enum):
ML = "ML"
Astro = "Astro"
Physics = "Physics"
class PublicationCategories(BaseModel):
categories: set[Category]
@field_validator('categories')
@classmethod
def ensure_foobar(cls, v):
if len(v) == 0:
raise ValueError('No categories found')
return v
model = outlines.models.transformers(
"HuggingFaceTB/SmolLM-135M-Instruct",
device="auto",
)
category_extractor = outlines.generate.json(model, PublicationCategories)
The validator hits this error when the LLM tries to return no outputs:
Value error, No categories found [type=value_error, input_value=[], input_type=list]
For further information visit https://errors.pydantic.dev/2.10/v/value_error
So I'm just wondering the proper way to do this. Thanks!
So I am just wondering how I can specify a field that is a set with a minimum of 1 element?
There's no straightforward way I can think of in the current state of the library. An ugly option that would work would be to have an Enum or a regex with all possible combinations. In Outlines v1 you can directly provide an Enum or a Literal as an output type by the way. Although still not ideal, the feature proposed in #1585 could be useful in your case.
Wouldn't list[Category] work here?
The issue is that a value could be repeated several times. There's also this bug related to lists of enums #1630
This cannot be done unless you impose constraints dynamically (you need to know what was generated)