distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Results 168 distilabel issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** We don't have any end to end tests checking different `Pipeline`s (with different combination of `Task`s and `LLM`s) **Describe the...

enhancement

Along with the generated dataset, it would be good to return a data structure containing statistics of the generation such as elapsed time, total tokens generated by the labeller, etc.

enhancement
team: ml

The idea is to implement a new LLM for using Replicate endpoints Ideally it should cover `public` and private `deployments` Draft here: https://github.com/argilla-io/distilabel/pull/47 See an example of a HTTP-based implementation:...

enhancement
good first issue
help wanted

The idea would be to build and run a benchmark with at least the following datasets: [HHH Alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment) & [MT Bench Human Judgment](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments). Our current preference task are: - UltraFeedback:...

enhancement
good first issue
help wanted
team: ml

## Description This PR fixes several issues that caused deadlocks and having a final dataset with unordered batches caused when a routing batch function was used in a pipeline. Several...

fix

**Describe the bug** When stopping a pipeline with CTRL+C, the batches from the step input queues are prepended back to the `_BatchManager` so no information is lost. The `_BatchManagerStep` contains...

bug

**Is your feature request related to a problem? Please describe.** The `LLM` classes could use the docstrings from the parent class, see for example `AnyscaleLLM`. **Describe the solution you'd like**...

enhancement

Closes #608 I've implemented the two images for running Distilabel, one that builds from ` runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04` and is able of using CUDA and one more constrained from `python:3.11-slim`. To try...

**Is your feature request related to a problem? Please describe.** In #601 we added the possibility to generate structured outputs from `llama-cpp`, `transformers` and `vllm` using `outlines`. It would be...

enhancement

## Description Since we're reproducing some papers with the `distilabel` task created to do so, we recently created `PrometheusEval`, but no tutorial has been uploaded yet, only https://x.com/alvarobartt/status/1788152893461123105 was posted....

documentation