distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

[BUG] `distilabel_metadata` column should get merged automatically by `GroupColumns`

Open gabrielmbmb opened this issue 7 months ago • 0 comments

Describe the bug When you have a pipeline like the following:

with Pipeline(name="pipe") as pipe:
    ...
    chat_generations = [
        ChatGeneration(
            llm=InferenceEndpointsLLM(model_id=model_id),
            input_mappings={"messages": "conversation"},
            input_batch_size=20,
        )
        for model_id in (
            "mistralai/Mixtral-8x7B-Instruct-v0.1",
            "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
            "meta-llama/Meta-Llama-3-70B-Instruct",
        )
    ]
    
    group_generations = GroupColumns(
        columns=["generation", "model_name"],
        output_columns=["generations", "generations_model_names"],
    )

The resulting distilabel_metadata column only contains the data from one of the upstream steps.

Expected behaviour The distilabel_metadata column should contain the data from all the upstream steps.

gabrielmbmb avatar Jul 19 '24 11:07 gabrielmbmb