distilabel
distilabel copied to clipboard
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
**Describe the bug** Generating larger datasets with `LoadDataFromDicts` leads to underutilization of the GPU during the `TextGeneration` step. **To Reproduce** Setting `N_SAMPLES` to a small value in the code below...
## Description In order to run the `vLLM` tests within the CI, we should be installing `vllm` in the CPU as per their official docs at https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html cc @gabrielmbmb
**Is your feature request related to a problem? Please describe.** Along with the addition of `raw_output_` and the proposed `raw_input_` (https://github.com/argilla-io/distilabel/issues/698). I think it would be nice to align this...
## Description Currently the `logging` handler created for the LLMs is named as `distilabel.llm.{llm.model_name}`, but the `llm.model_name` property shouldn't be used since it can be confusing or even expose a...
This PR adds a general step that enables the use of the OpenAI Batch API as discussed in #538. The Step follows roughly the same API as a Task but...
## Description TODO ### Ideas * Add `use_cache` flag at `Step` level to avoid caching
- went for lancedb because it works in memory. - @frascuchon as follow up we can consider adding argilla based on your vector search PR :) Do vector search using...
This PR adds llama-cpp support to create embeddings. ``` from distilabel.embeddings import LlamaCppEmbeddings embeddings = LlamaCppEmbeddings(model="second-state/all-MiniLM-L6-v2-Q2_K.gguf") embeddings.load() results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"]) # [ # [-0.05447685346007347, -0.01623094454407692, ...],...
fix impute output when the output_mapping is not empty
**Is your feature request related to a problem? Please describe.** When running a TextGeneration task on a big dataset using the OpenAI API, I'm getting the following error: `openai.RateLimitError: Error...