distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Results 168 distilabel issues
Sort by recently updated
recently updated
newest added

## Description This PR modifies the default behaviour of `LoadHubDataset` to use `streaming=False` as default, and tries to fetch the column info from the cached dataset if found. Closes #561,...

improvement

This draft PR proposes a way of using a DSPy prediction module as a text generation step. The advantage of this is that text generation could use an optimised, evaluated,...

## Description This PR aligns the kwargs for some of the implemented `LLM` subclasses, based on their engine counterparts, so that all the kwargs can be provided to the `LLM`...

fix
improvement

**Is your feature request related to a problem? Please describe.** I appreciate the work distilabel is doing and making it easier for the community to produce high quality datasets.Thank you!...

improvement

**Describe the bug** Apparently, the cache location is different in the `Pipeline.run` method before and after calling the `super().run`, since the signature is updated, and it modifies the path, so...

bug

## Description Make the `_Step.model_post_init` less strict to allow instantiating steps without a `Pipeline` and throwing a warning instead of raising a `ValueError`. This should simplify testing steps without the...

improvement

## Description See milestone https://github.com/argilla-io/distilabel/milestone/8

release

**Is your feature request related to a problem? Please describe.** In the generated dataset we're saying rate following the annotation guidelines but they are empty. **Describe the solution you'd like**...

enhancement

I want to push my results to huggingface with frequency 2000, like in distilabel 0.6.0: ``` freq = 2000 dataset_checkpoint = DatasetCheckpoint(path=Path.cwd() / "checkpoint_folder_evol_cn", save_frequency=freq, strategy = 'hf-hub', extra_kwargs={"repo_id": 'xxx/xxx',...

question