distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Results 168 distilabel issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** Generation with reliance on external APIs might be expensive. **Describe the solution you'd like** Something like a pipeline method `expected_costs`...

idea

Is there a way to use a local dataset? Convert the data loaded in dataset.load_from_disk to a list, and Passing it to LoadDataFromDicts freezes the pipeline. code: ``` from datasets...

## Description Add a `GenstructTask` that allows to parse the output from https://huggingface.co/NousResearch/Genstruct-7B

enhancement

In some cases, e.g. testing your pipeline before running it, one would like to select only a couple of examples from the HF dataset loaded in `src.distilabel.steps.generators.huggingface.LoadHubDataset`. Therefore I offer...

**Describe the bug** I had trouble figuring out why my pipeline was failing and the error messages were not informative. I managed to obtain a way more useful error message...

enhancement

[Groq](https://wow.groq.com/why-groq/) is a startup developing LPU engines which provide 18x faster inference than GPUs. They currently provide Llama2-70B via API, and Mixtral 8x7B access on request. It would be great...

integrations

See [here](https://platform.openai.com/docs/api-reference/batch/create) for the API and [here](https://twitter.com/jeffintime/status/1779924149755924707?t=Tmo3Aoo62N5zA5PWLNiqUw) for the (Twitter) announcement. 50% discount would be huge as most large jobs running on distilabel are not time-critical for us. However there...

enhancement
integrations

**Is your feature request related to a problem? Please describe.** Currently, I get the message `Processing batch x` but this does not indicate how far we are. **Describe the solution...

enhancement

**Is your feature request related to a problem? Please describe.** I was speaking to Oras Al-Kubaisi, he proposed it might be nice to have a pipeline UI playground. I think...

enhancement

The `mp.Queue` that we're using to pass the data between the steps is very slow when it's used to send a lot of data (for example when accumulating `embeddings` from...

improvement