argilla [FEATURE] add support for infering `FeedbackDataset` structure in `from

[FEATURE] add support for infering `FeedbackDataset` structure in `from_huggingface` for transformer models

Open davidberenstein1957 opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. I would like to focus on HF models.

Describe the solution you'd like https://huggingface.co/models has models categorized by task

import argilla as rg

rg.FeedbackDataset.from_huggingface(""ProsusAI/finbert")

Internally, something like this should happen, but Ideally we should avoid downloading the entire model and just use a config.

import argilla as rg
from transformers import pipeline

name = "sentiment-analysis"
pipe = pipeline(name)

ds = rg.FeedbackDataset.for_text_classification(
    labels=list(pipe.model.config.id2label.values()),
    multi_label=pipe.model.config.problem_type == "multi_label_classification"
)

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Oct 25 '23 03:10 davidberenstein1957

Hi @davidberenstein1957, I think that re-using from_huggingface to not just load Argilla datasets dumped in the Hugging Face Hub, but also to load a configuration for any given model can be confusing to users and also confusing internally code-wise, so if this appears to happen I think we need to discuss about a proper method on doing so. Also the idea you propose I assume is to re-label already labelled datasets? If you could elaborate more over e.g. Notion and share with the team that would be great!

Oct 25 '23 06:10 alvarobartt

Hi @alvarobartt, it is not something that is directly happening or was mentioned anywhere. However, I was just dreaming and thinking a bit and given that have gotten a lot of mentions that people don't understand how to use and configure the dataset so things like the task_templates could help for those. It is not used to re-label a dataset but more so to easily configure and link them. Similar to the reasoning about using a default embedding_model and text descriptions metadata for datasets.

Oct 25 '23 06:10 davidberenstein1957

I agree with @alvarobartt that from_huggingface might be confusing. I think this might be better placed in the task templates somehow but also we might want look at the bigger picture: associate hub model IDs with datasets for using them in different parts of the product (retraining, inference, etc.)

Oct 25 '23 07:10 dvsrepo

This issue is stale because it has been open for 90 days with no activity.

Jan 29 '24 01:01 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

Apr 02 '24 01:04 github-actions[bot]

argilla argilla copied to clipboard

[FEATURE] add support for infering `FeedbackDataset` structure in `from_huggingface` for transformer models

argilla
argilla copied to clipboard