distilabel issues

Results 126 distilabel issues

Sort by recently updated

Load hub dataset modification to work offline and change to not streaming by default

## Description This PR modifies the default behaviour of `LoadHubDataset` to use `streaming=False` as default, and tries to fetch the column info from the cached dataset if found. Closes #561,...

plaguss

improvement

DSPy as a Step

This draft PR proposes a way of using a DSPy prediction module as a text generation step. The advantage of this is that text generation could use an optimised, evaluated,...

burtenshaw

Add `routing_batch_function` argument to `connect`

## Description WIP

gabrielmbmb

Extend `LLM` kwargs to align with counterparts

## Description This PR aligns the kwargs for some of the implemented `LLM` subclasses, based on their engine counterparts, so that all the kwargs can be provided to the `LLM`...

alvarobartt

fix

improvement

there should be an option to pass `n_ctx` Llama from llama_cpp

**Is your feature request related to a problem? Please describe.** I appreciate the work distilabel is doing and making it easier for the community to produce high quality datasets.Thank you!...

amritsingh183

improvement

[BUG] `pipeline.log` cache location not consistent within the same `Pipeline`

**Describe the bug** Apparently, the cache location is different in the `Pipeline.run` method before and after calling the `super().run`, since the signature is updated, and it modifies the path, so...

alvarobartt

bug

Make `pipeline` argument of `Step` optional

## Description Make the `_Step.model_post_init` less strict to allow instantiating steps without a `Pipeline` and throwing a warning instead of raising a `ValueError`. This should simplify testing steps without the...

plaguss

improvement

`distilabel` v1.1

## Description See milestone https://github.com/argilla-io/distilabel/milestone/8

alvarobartt

release

[FEATURE] Add quick annotation guidelines

**Is your feature request related to a problem? Please describe.** In the generated dataset we're saying rate following the annotation guidelines but they are empty. **Describe the solution you'd like**...

dvsrepo

enhancement

Question about checkpoint strategy

I want to push my results to huggingface with frequency 2000, like in distilabel 0.6.0: ``` freq = 2000 dataset_checkpoint = DatasetCheckpoint(path=Path.cwd() / "checkpoint_folder_evol_cn", save_frequency=freq, strategy = 'hf-hub', extra_kwargs={"repo_id": 'xxx/xxx',...

YueWu0301

question

distilabel
distilabel copied to clipboard

Metadata

Load hub dataset modification to work offline and change to not streaming by default

DSPy as a Step

Add `routing_batch_function` argument to `connect`

Extend `LLM` kwargs to align with counterparts

there should be an option to pass `n_ctx` Llama from llama_cpp

[BUG] `pipeline.log` cache location not consistent within the same `Pipeline`

Make `pipeline` argument of `Step` optional

`distilabel` v1.1

[FEATURE] Add quick annotation guidelines

Question about checkpoint strategy

← Metadata

Owner

Metadata

distilabel distilabel copied to clipboard

Metadata

← Metadata

Owner

Metadata

distilabel
distilabel copied to clipboard