distilabel
distilabel copied to clipboard
[FEATURE] classifier from prompt and synthetic data
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] It would be amazing to include a way to prompt and fine-tune/distil data from LLMs for predictive tasks and directly fine-tune a model on that.
Describe the solution you'd like Start with TextCat
- generate
text
data - generate
label
data (might be conbined with step above) - fine-tune a model with AutoTrain or SetFit
Describe alternatives you've considered N.A.
Additional context We first need methods to compose such pipelines. See: https://github.com/argilla-io/distilabel/issues/797 something like: https://github.com/e-p-armstrong/augmentoolkit?tab=readme-ov-file#classifier-creator