distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Results 168 distilabel issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** Both tasks seem to share a lot of logic so there is some code duplication. **Describe the solution you'd like**...

enhancement

**Is your feature request related to a problem? Please describe.** We might suffer from downloading unneed large models. **Describe the solution you'd like** Something like this https://huggingface.co/distilabel-internal-testing/tiny-random-mistral was proposed by...

enhancement

## Description Add a custom `Step` that runs `DSPy` even if it's only an example on how to use it via `distilabel` v1.0.0. The step could optimize a prompt from...

integrations

**Is your feature request related to a problem? Please describe.** Async is cool but debugging can be a pain. **Describe the solution you'd like** I would love to have synchronous...

idea

Create a notebook showing an end2end workflow with distilabel to create a preference dataset based on a ~200-page economic document (IMF World Economic Outlook, April 2023). The preference dataset could...

## Which page or section is this issue related to? Currently the code snippet in the vLLM section of the guide (https://distilabel.argilla.io/latest/technical-reference/llms/#vllm) looks like: ```python llm = vLLM( model=LLM(model="argilla/notus-7b-v1"), task=TextGenerationTask(),...

## Description A high impact task for distilabel is one that generates follow up turns or multi-turn dialogues (which then can be criticized/ranked Given a conversation (or at least a...

enhancement

**Is your feature request related to a problem? Please describe.** In [this PR](https://github.com/argilla-io/distilabel/pull/203), we introduced the `ChatTask` but we want to add as much information to the data we send...

enhancement

The idea is to set up the Open In Colab and Open GitHub Source as a template overridden feature of the mkdocs template, that should be possible. We have some...

documentation

Our current preference pipelines work with the assumption of single-turn (instruction) datasets. To generate high-quality data preferences we need to support multi-turn data.

enhancement