distilabel
distilabel copied to clipboard
[FEATURE] Conditional connections between Steps
Description
In previous distilabel version someone could use an LLMPool
and have N LLMs, but only use a number < N for generating responses.
In the current version, there is no such a thing like LLMPool
and no mechanism allowing to randomly sample LLMs used for generation.
Proposal
Update connect
to allow receiving a list of steps or *args
, and optionally allow to pass a routing_batch_function
that will determine to what downstream steps the batch is fed.
import random
from typing import List
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadHubDataset
from distilabel.steps.tasks import TextGeneration
# Routing function!
def sample_two_llms(downstream_step_names: List[str]) -> List[str]:
return random.sample(downstream_step_names, k=2)
with Pipeline(
name="simple-text-generation-pipeline",
description="A simple text generation pipeline",
) as pipeline:
load_dataset = LoadHubDataset(
name="load_dataset",
output_mappings={"prompt": "instruction"},
)
generate_text = TextGeneration(name="generate_text", llm=...)
generate_text_2 = TextGeneration(name="generate_text_2", llm=...)
generate_text_3 = TextGeneration(name="generate_text_3", llm=...)
load_dataset.connect(
generate_text,
generate_text_2,
generate_text_3,
routing_batch_function=sample_two_llms,
)