distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

[FEATURE] Add `Callable` and `GlobalCallable` that takes custom `callable` as argument

Open davidberenstein1957 opened this issue 7 months ago • 3 comments

Is your feature request related to a problem? Please describe. I think column operations for distilabel are not always very intuitive. It might be easier for a user to pass self-defined callable functions instead of defining a CustomStep, which feels more cumbersome.

Describe the solution you'd like

from distilabel.steps import Callable

def my_function(sample: dict):
    del sample["key"]
    sample["c"] = sample["a"] + sample["b"]
    return sample

Callable(
    name="callable",
    fn=my_function,
    # Assuming something like this is needed for validation
    inputs=["key", "a", "b"], #defaults to all
    outputs=["c"]
)

VS old options

from distilabel.steps import step
from distilabel.steps.typing import GeneratorStepOutput

@step(outputs=[...], step_type="generator")
def CustomGeneratorStep(offset: int = 0) -> GeneratorStepOutput:
    yield (
        ...,
        True if offset == 10 else False,
    )

step = CustomGeneratorStep(name="my-step")

or

from distilabel.steps import step
from distilabel.steps.typing import GeneratorStepOutput

@step(outputs=[...], step_type="generator")
def CustomGeneratorStep(offset: int = 0) -> GeneratorStepOutput:
    yield (
        ...,
        True if offset == 10 else False,
    )

step = CustomGeneratorStep(name="my-step")

Describe alternatives you've considered Custom Steps and Tasks

Additional context https://discord.com/channels/879548962464493619/1217729625401196574/1265730218505539686

davidberenstein1957 avatar Jul 25 '24 08:07 davidberenstein1957