distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

[FEATURE] Conditional connections between Steps

Open gabrielmbmb opened this issue 10 months ago • 0 comments

Description

In previous distilabel version someone could use an LLMPool and have N LLMs, but only use a number < N for generating responses.

In the current version, there is no such a thing like LLMPool and no mechanism allowing to randomly sample LLMs used for generation.

Proposal

Update connect to allow receiving a list of steps or *args, and optionally allow to pass a routing_batch_function that will determine to what downstream steps the batch is fed.

import random
from typing import List

from distilabel.pipeline import Pipeline
from distilabel.steps import LoadHubDataset
from distilabel.steps.tasks import TextGeneration


# Routing function!
def sample_two_llms(downstream_step_names: List[str]) -> List[str]:
    return random.sample(downstream_step_names, k=2)


with Pipeline(
    name="simple-text-generation-pipeline",
    description="A simple text generation pipeline",
) as pipeline:
    load_dataset = LoadHubDataset(
        name="load_dataset",
        output_mappings={"prompt": "instruction"},
    )

    generate_text = TextGeneration(name="generate_text", llm=...)

    generate_text_2 = TextGeneration(name="generate_text_2", llm=...)

    generate_text_3 = TextGeneration(name="generate_text_3", llm=...)

    load_dataset.connect(
        generate_text,
        generate_text_2,
        generate_text_3,
        routing_batch_function=sample_two_llms,
    )

gabrielmbmb avatar Apr 18 '24 15:04 gabrielmbmb