distilabel [FEATURE] Add quick annotation guidelines

Is your feature request related to a problem? Please describe. In the generated dataset we're saying rate following the annotation guidelines but they are empty.

Captura de pantalla 2024-04-26 a las 11 52 29

Describe the solution you'd like We should include a very brief annotation guideline when setting up the dataset. Maybe reuse the general parts of the UF prompt?

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Apr 26 '24 09:04 dvsrepo

Maybe adding default annotation guidelines can be complex if we want this to be extensible, so the easiest may be ~to just add the task definition within the guidelines i.e. "rate the quality of the responses assuming that those were generated using the following prompt template" or something similar;~ but I think we can also add Rate ... given instruction based on the annotation guidelines if any. (i.e. adding if any at the end)

Edit: forget what I just said, we have no traceability on which task was used, could be done in a hacky way but IMO not worth it; it would be better to just expose that within the init so that the person can set it instead.

Apr 26 '24 09:04 alvarobartt

Maybe adding default annotation guidelines can be complex if we want this to be extensible, so the easiest may be ~to just add the task definition within the guidelines i.e. "rate the quality of the responses assuming that those were generated using the following prompt template" or something similar;~ but I think we can also add Rate ... given instruction based on the annotation guidelines if any. (i.e. adding if any at the end)

Edit: forget what I just said, we have no traceability on which task was used, could be done in a hacky way but IMO not worth it; it would be better to just expose that within the init so that the person can set it instead.

No. We know is a preference dataset. I'm just talking about something like:

Rate the quality of the responses to the instructions based on aspects like .... (that's what I meant by reusing some language of the UF prompt).

Either that or simply remove the mention to the guidelines in the questions:

Rate generations-0 given the instruction *

Apr 26 '24 11:04 dvsrepo

No. We know is a preference dataset. I'm just talking about something like:

Rate the quality of the responses to the instructions based on aspects like .... (that's what I meant by reusing some language of the UF prompt).

Fair! Do you prefer that over simply removing the mention to the guidelines? Otherwise, do you have something in mind i.e. that works for most of the use cases? Otherwise we can just remove those to avoid confusion, as both the question titles and guidelines can be later edited within the Argilla UI already 👍🏻

Apr 28 '24 11:04 alvarobartt

No. We know is a preference dataset. I'm just talking about something like: Rate the quality of the responses to the instructions based on aspects like .... (that's what I meant by reusing some language of the UF prompt).

Fair! Do you prefer that over simply removing the mention to the guidelines? Otherwise, do you have something in mind i.e. that works for most of the use cases? Otherwise we can just remove those to avoid confusion, as both the question titles and guidelines can be later edited within the Argilla UI already 👍🏻

Apologies for the late reply. I agree the simplest and more maintainable is to remove the ref to the guidelines in the question titles.

Apr 30 '24 08:04 dvsrepo

No worries at all 👍🏻 I'll create the PR now and invite you to review, thanks for the feedback!

Apr 30 '24 11:04 alvarobartt

distilabel distilabel copied to clipboard

[FEATURE] Add quick annotation guidelines

distilabel
distilabel copied to clipboard