distilabel
distilabel copied to clipboard
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
## Description Add a "Step Gallery" section that gets automatically built from code. The gallery will list the steps from the current version of distilabel, and will display a nice...
**Is your feature request related to a problem? Please describe.** Inference Endpoints will by default use the cache unless explicitly specified otherwise, so we should add a flag to control...
## Description The current `Pipeline` implementation creates a subprocess to execute each `Step`. There are some `Step`s that only execute simpler transformations and that are very light to execute, therefore,...
**Is your feature request related to a problem? Please describe.** The `LoadHubDataset` step can only work if connected to Hugging Face. **Describe the solution you'd like** - Remove the hardcoded...
## Description When creating an `Step` it's required to pass a `pipeline` argument or to create the `Step` within the pipeline context manager, so one is assigned automatically. This can...
## Description In previous distilabel version someone could use an `LLMPool` and have N LLMs, but only use a number < N for generating responses. In the current version, there...
Currently, each `Step` of the `Pipeline` gets executed in a single process. It would be good to achieve parallelization at `Step` level too, i.e. one step of the pipeline uses...
**Is your feature request related to a problem? Please describe.** In `distilabel
**Describe the bug** I've defined a custom `Step` which seems to cause the `Pipeline` to get stuck in loading in Google Colab as discussed with @gabrielmbmb . Apperantly @frascuchon had...
After I successfully ran the pipeline once, I can no longer reproduce my code, even if I changed my name, entered data and related parameters, and reported the following error....