distilabel issues

[FEATURE] Add end to end tests

**Is your feature request related to a problem? Please describe.** We don't have any end to end tests checking different `Pipeline`s (with different combination of `Task`s and `LLM`s) **Describe the...

gabrielmbmb

enhancement

[FEATURE] Return generation statistics

1

Along with the generated dataset, it would be good to return a data structure containing statistics of the generation such as elapsed time, total tokens generated by the labeller, etc.

gabrielmbmb

enhancement

team: ml

[FEATURE] Add support for Replicate endpoints

The idea is to implement a new LLM for using Replicate endpoints Ideally it should cover `public` and private `deployments` Draft here: https://github.com/argilla-io/distilabel/pull/47 See an example of a HTTP-based implementation:...

dvsrepo

enhancement

good first issue

help wanted

[FEATURE] Benchmark existing preference tasks (UltraFeedback, UltraJudge, JudgeLM)

4

The idea would be to build and run a benchmark with at least the following datasets: [HHH Alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment) & [MT Bench Human Judgment](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments). Our current preference task are: - UltraFeedback:...

dvsrepo

enhancement

good first issue

help wanted

team: ml

Fix routing batch function deadlocks and unordered batches

## Description This PR fixes several issues that caused deadlocks and having a final dataset with unordered batches caused when a routing batch function was used in a pipeline. Several...

gabrielmbmb

fix

[BUG] Batches are not prepended correctly when CTRL+C

**Describe the bug** When stopping a pipeline with CTRL+C, the batches from the step input queues are prepended back to the `_BatchManager` so no information is lost. The `_BatchManagerStep` contains...

gabrielmbmb

bug

[FEATURE] Create decorator to include the docstrings of the parent class

**Is your feature request related to a problem? Please describe.** The `LLM` classes could use the docstrings from the parent class, see for example `AnyscaleLLM`. **Describe the solution you'd like**...

plaguss

enhancement

Docker Image for running distilabel CLI

2

Closes #608 I've implemented the two images for running Distilabel, one that builds from ` runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04` and is able of using CUDA and one more constrained from `python:3.11-slim`. To try...

ignacioct

[INTEGRATION] Add `instructor` to generate structured outputs from private APIs

**Is your feature request related to a problem? Please describe.** In #601 we added the possibility to generate structured outputs from `llama-cpp`, `transformers` and `vllm` using `outlines`. It would be...

plaguss

enhancement

[FEATURE] Add `PrometheusEval` example and paper reproduction

## Description Since we're reproducing some papers with the `distilabel` task created to do so, we recently created `PrometheusEval`, but no tutorial has been uploaded yet, only https://x.com/alvarobartt/status/1788152893461123105 was posted....

alvarobartt

documentation

distilabel
distilabel copied to clipboard

Metadata

[FEATURE] Add end to end tests

[FEATURE] Return generation statistics

[FEATURE] Add support for Replicate endpoints

[FEATURE] Benchmark existing preference tasks (UltraFeedback, UltraJudge, JudgeLM)

Fix routing batch function deadlocks and unordered batches

[BUG] Batches are not prepended correctly when CTRL+C

[FEATURE] Create decorator to include the docstrings of the parent class

Docker Image for running distilabel CLI

[INTEGRATION] Add `instructor` to generate structured outputs from private APIs

[FEATURE] Add `PrometheusEval` example and paper reproduction

← Metadata

Owner

Metadata

distilabel distilabel copied to clipboard

Metadata

← Metadata

Owner

Metadata

distilabel
distilabel copied to clipboard