distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

[FEATURE] classifier from prompt and synthetic data

Open davidberenstein1957 opened this issue 7 months ago • 0 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] It would be amazing to include a way to prompt and fine-tune/distil data from LLMs for predictive tasks and directly fine-tune a model on that.

Describe the solution you'd like Start with TextCat

  • generate text data
  • generate label data (might be conbined with step above)
  • fine-tune a model with AutoTrain or SetFit

Describe alternatives you've considered N.A.

Additional context We first need methods to compose such pipelines. See: https://github.com/argilla-io/distilabel/issues/797 something like: https://github.com/e-p-armstrong/augmentoolkit?tab=readme-ov-file#classifier-creator

davidberenstein1957 avatar Jul 24 '24 10:07 davidberenstein1957