distilabel
distilabel copied to clipboard
[BUG] `distilabel_metadata` column should get merged automatically by `GroupColumns`
Describe the bug When you have a pipeline like the following:
with Pipeline(name="pipe") as pipe:
...
chat_generations = [
ChatGeneration(
llm=InferenceEndpointsLLM(model_id=model_id),
input_mappings={"messages": "conversation"},
input_batch_size=20,
)
for model_id in (
"mistralai/Mixtral-8x7B-Instruct-v0.1",
"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
"meta-llama/Meta-Llama-3-70B-Instruct",
)
]
group_generations = GroupColumns(
columns=["generation", "model_name"],
output_columns=["generations", "generations_model_names"],
)
The resulting distilabel_metadata
column only contains the data from one of the upstream steps.
Expected behaviour
The distilabel_metadata
column should contain the data from all the upstream steps.