distilabel
distilabel copied to clipboard
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
**Is your feature request related to a problem? Please describe.** Generation with reliance on external APIs might be expensive. **Describe the solution you'd like** Something like a pipeline method `expected_costs`...
Is there a way to use a local dataset? Convert the data loaded in dataset.load_from_disk to a list, and Passing it to LoadDataFromDicts freezes the pipeline. code: ``` from datasets...
## Description Add a `GenstructTask` that allows to parse the output from https://huggingface.co/NousResearch/Genstruct-7B
In some cases, e.g. testing your pipeline before running it, one would like to select only a couple of examples from the HF dataset loaded in `src.distilabel.steps.generators.huggingface.LoadHubDataset`. Therefore I offer...
**Describe the bug** I had trouble figuring out why my pipeline was failing and the error messages were not informative. I managed to obtain a way more useful error message...
[Groq](https://wow.groq.com/why-groq/) is a startup developing LPU engines which provide 18x faster inference than GPUs. They currently provide Llama2-70B via API, and Mixtral 8x7B access on request. It would be great...
See [here](https://platform.openai.com/docs/api-reference/batch/create) for the API and [here](https://twitter.com/jeffintime/status/1779924149755924707?t=Tmo3Aoo62N5zA5PWLNiqUw) for the (Twitter) announcement. 50% discount would be huge as most large jobs running on distilabel are not time-critical for us. However there...
**Is your feature request related to a problem? Please describe.** Currently, I get the message `Processing batch x` but this does not indicate how far we are. **Describe the solution...
**Is your feature request related to a problem? Please describe.** I was speaking to Oras Al-Kubaisi, he proposed it might be nice to have a pipeline UI playground. I think...
The `mp.Queue` that we're using to pass the data between the steps is very slow when it's used to send a lot of data (for example when accumulating `embeddings` from...