distilabel
distilabel copied to clipboard
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
**Is your feature request related to a problem? Please describe.** We don't have any end to end tests checking different `Pipeline`s (with different combination of `Task`s and `LLM`s) **Describe the...
Along with the generated dataset, it would be good to return a data structure containing statistics of the generation such as elapsed time, total tokens generated by the labeller, etc.
The idea is to implement a new LLM for using Replicate endpoints Ideally it should cover `public` and private `deployments` Draft here: https://github.com/argilla-io/distilabel/pull/47 See an example of a HTTP-based implementation:...
The idea would be to build and run a benchmark with at least the following datasets: [HHH Alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment) & [MT Bench Human Judgment](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments). Our current preference task are: - UltraFeedback:...
## Description This PR fixes several issues that caused deadlocks and having a final dataset with unordered batches caused when a routing batch function was used in a pipeline. Several...
**Describe the bug** When stopping a pipeline with CTRL+C, the batches from the step input queues are prepended back to the `_BatchManager` so no information is lost. The `_BatchManagerStep` contains...
**Is your feature request related to a problem? Please describe.** The `LLM` classes could use the docstrings from the parent class, see for example `AnyscaleLLM`. **Describe the solution you'd like**...
Closes #608 I've implemented the two images for running Distilabel, one that builds from ` runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04` and is able of using CUDA and one more constrained from `python:3.11-slim`. To try...
**Is your feature request related to a problem? Please describe.** In #601 we added the possibility to generate structured outputs from `llama-cpp`, `transformers` and `vllm` using `outlines`. It would be...
## Description Since we're reproducing some papers with the `distilabel` task created to do so, we recently created `PrometheusEval`, but no tutorial has been uploaded yet, only https://x.com/alvarobartt/status/1788152893461123105 was posted....