argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[DOCS] "Bulk Labeling Multimodal Data" Notebook outdated

Open trojblue opened this issue 4 months ago • 1 comments

Which page or section is this issue related to?

https://github.com/argilla-io/argilla/blob/develop/docs/_source/tutorials/notebooks/labelling-textclassification-sentencetransformers-semantic.ipynb

In the notebook i found several issues incompatible with the current version of argilla:

1. the dependency:

%pip install argilla "setfit~=0.2.0" "datasets~=2.3.0" transformers sentence-transformers -qqq
  • when dependencies are installed with "setfit~=0.2.0" "datasets~=2.3.0", and argilla is imported in the line import argilla as rg, it fails to import because an Error from datasets is not found and cannot be imported by argilla. (forgot the exact one)
  • the solution is remove the version limits, and I have datasets 3.0.1 and setfit 1.1.0

2. the init:

rg.init(
    api_url="https://localhost:6900",
    api_key="admin.apikey"
)

gets the error AttributeError: module 'argilla' has no attribute 'init', and the correct way to init seems to be:

client = rg.Argilla(
    api_url="some_url",
    api_key="argilla.apikey"
)

3. the dataset:

the dataset defined in the notebook (burtenshaw/electronics) is not available anymore on huggingface:

ELECTRONICS_DATASET = "burtenshaw/electronics"
dataset = load_dataset(ELECTRONICS_DATASET)
labels = dataset["labelled"].features["label"].names
int2str = dataset["labelled"].features["label"].int2str

I haven't tried further into the notebook, so there could be more issues after this still. For future reference I'm currently on argilla 2.2.2:

Name: argilla
Version: 2.2.2
Summary: The Argilla python server SDK
Home-page: 
Author: 
Author-email: Argilla <[email protected]>
License: Apache 2.0
Location: /root/miniconda3/lib/python3.10/site-packages
Requires: datasets, httpx, huggingface_hub, pillow, pydantic, rich, tqdm
Required-by:

trojblue avatar Oct 02 '24 08:10 trojblue