distilabel
distilabel copied to clipboard
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
## Description This PR adds `Tasks` to replicate [APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets](https://arxiv.org/abs/2406.18518), which yielded the following dataset: [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k). The following is a draft pipeline...
**Is your feature request related to a problem? Please describe.** We could create an example replicating this paper, we have most of the pieces and seems quite interesting: [CRAFT Your...
This PR adds a generator step to listen to Argilla events.
closes https://github.com/argilla-io/distilabel/issues/663 ### Issue 01ai changed the base url of their managed service , but it's available and faster than ever ! ### Solution Adds OneAI client to distilabel with...
**Is your feature request related to a problem? Please describe.** Google AI models can be used with their [genai](https://pypi.org/project/google-generativeai/) library. I think it would be nice to have support for...
**Is your feature request related to a problem? Please describe.** Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a...
## Description As suggested by @alvarobartt, it would be nice to integrate [`mlx-lm`](https://pypi.org/project/mlx-lm/)
## Description Integrate [`sglang`](https://github.com/sgl-project/sglang)
## Description [llm-swarm](https://github.com/huggingface/llm-swarm)