distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

Results 126 distilabel issues
Sort by recently updated
recently updated
newest added

## Description This PR modifies the default behaviour of `LoadHubDataset` to use `streaming=False` as default, and tries to fetch the column info from the cached dataset if found. Closes #561,...

improvement

This draft PR proposes a way of using a DSPy prediction module as a text generation step. The advantage of this is that text generation could use an optimised, evaluated,...

## Description This PR aligns the kwargs for some of the implemented `LLM` subclasses, based on their engine counterparts, so that all the kwargs can be provided to the `LLM`...

fix
improvement

**Is your feature request related to a problem? Please describe.** I appreciate the work distilabel is doing and making it easier for the community to produce high quality datasets.Thank you!...

improvement

**Describe the bug** Apparently, the cache location is different in the `Pipeline.run` method before and after calling the `super().run`, since the signature is updated, and it modifies the path, so...

bug

## Description Make the `_Step.model_post_init` less strict to allow instantiating steps without a `Pipeline` and throwing a warning instead of raising a `ValueError`. This should simplify testing steps without the...

improvement

## Description See milestone https://github.com/argilla-io/distilabel/milestone/8

release

**Is your feature request related to a problem? Please describe.** In the generated dataset we're saying rate following the annotation guidelines but they are empty. **Describe the solution you'd like**...

enhancement

I want to push my results to huggingface with frequency 2000, like in distilabel 0.6.0: ``` freq = 2000 dataset_checkpoint = DatasetCheckpoint(path=Path.cwd() / "checkpoint_folder_evol_cn", save_frequency=freq, strategy = 'hf-hub', extra_kwargs={"repo_id": 'xxx/xxx',...

question