distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Results 168 distilabel issues
Sort by recently updated
recently updated
newest added

## Description This PR adds `Tasks` to replicate [APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets](https://arxiv.org/abs/2406.18518), which yielded the following dataset: [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k). The following is a draft pipeline...

enhancement

**Is your feature request related to a problem? Please describe.** We could create an example replicating this paper, we have most of the pieces and seems quite interesting: [CRAFT Your...

documentation
enhancement

This PR adds a generator step to listen to Argilla events.

closes https://github.com/argilla-io/distilabel/issues/663 ### Issue 01ai changed the base url of their managed service , but it's available and faster than ever ! ### Solution Adds OneAI client to distilabel with...

**Is your feature request related to a problem? Please describe.** Google AI models can be used with their [genai](https://pypi.org/project/google-generativeai/) library. I think it would be nice to have support for...

**Is your feature request related to a problem? Please describe.** Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a...

## Description As suggested by @alvarobartt, it would be nice to integrate [`mlx-lm`](https://pypi.org/project/mlx-lm/)

enhancement

## Description Integrate [`sglang`](https://github.com/sgl-project/sglang)

enhancement

## Description [llm-swarm](https://github.com/huggingface/llm-swarm)

enhancement