argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[BUG] [SDK] [v2] StopIteration error when retrieving records with existing responses.

Open burtenshaw opened this issue 8 months ago • 1 comments

Describe the bug When updating suggestions in records via DatasetRecords.log a StopIteration error is thrown trying to iterate over the non-existent user_ids.

This happens on the argilla dev environment deployment, and not on a local server

Stacktrace and Code to create the bug

# add suggestions

dataset = load_dataset(
    "HuggingFaceFW/fineweb-edu", name="sample-10BT", split="train", streaming=True
)
data = list(dataset.take(100))

ds = dev.datasets("fineweb-edu")
suggestions = [
    {"id": record["id"], "int_score": record["int_score"]}
    for record in data
]
ds.records.log(records=suggestions, batch_size=10)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[7], [line 1](vscode-notebook-cell:?execution_count=7&line=1)
----> [1](vscode-notebook-cell:?execution_count=7&line=1) ds.records.log(records=suggestions, batch_size=10)

File ~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235, in DatasetRecords.log(self, records, mapping, user_id, batch_size)
    [233](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:233)     batch_records = record_models[batch : batch + batch_size]
    [234](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:234)     models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
--> [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235)     created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
    [236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:236)     records_updated += updated
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:238) records_created = len(created_or_updated) - records_updated

File ~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235, in <listcomp>(.0)
    [233](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:233)     batch_records = record_models[batch : batch + batch_size]
    [234](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:234)     models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
--> [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235)     created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
    [236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:236)     records_updated += updated
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:238) records_created = len(created_or_updated) - records_updated

File ~/git/argilla/argilla/src/argilla/records/_resource.py:250, in Record.from_model(cls, model, dataset)
    [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:235) @classmethod
    [236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:236) def from_model(cls, model: RecordModel, dataset: "Dataset") -> "Record":
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:237)     """Converts a RecordModel object to a Record object.
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:238)     Args:
    [239](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:239)         model: A RecordModel object.
   (...)
    [242](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:242)         A Record object.
    [243](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:243)     """
    [244](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:244)     return cls(
    [245](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:245)         id=model.external_id,
    [246](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:246)         fields=model.fields,
    [247](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:247)         metadata={meta.name: meta.value for meta in model.metadata},
    [248](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:248)         vectors=[Vector.from_model(model=vector) for vector in model.vectors],
    [249](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:249)         # Responses and their models are not aligned 1-1.
--> [250](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:250)         responses=[
    [251](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:251)             response
    [252](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:252)             for response_model in model.responses
    [253](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:253)             for response in UserResponse.from_model(response_model, dataset=dataset)
    [254](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:254)         ],
    [255](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:255)         suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],
    [256](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:256)         _dataset=dataset,
    [257](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:257)         _server_id=model.id,
    [258](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:258)     )

File ~/git/argilla/argilla/src/argilla/records/_resource.py:253, in <listcomp>(.0)
    [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:235) @classmethod
    [236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:236) def from_model(cls, model: RecordModel, dataset: "Dataset") -> "Record":
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:237)     """Converts a RecordModel object to a Record object.
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:238)     Args:
    [239](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:239)         model: A RecordModel object.
   (...)
    [242](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:242)         A Record object.
    [243](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:243)     """
    [244](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:244)     return cls(
    [245](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:245)         id=model.external_id,
    [246](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:246)         fields=model.fields,
    [247](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:247)         metadata={meta.name: meta.value for meta in model.metadata},
    [248](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:248)         vectors=[Vector.from_model(model=vector) for vector in model.vectors],
    [249](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:249)         # Responses and their models are not aligned 1-1.
    [250](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:250)         responses=[
    [251](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:251)             response
    [252](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:252)             for response_model in model.responses
--> [253](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:253)             for response in UserResponse.from_model(response_model, dataset=dataset)
    [254](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:254)         ],
    [255](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:255)         suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],
    [256](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:256)         _dataset=dataset,
    [257](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:257)         _server_id=model.id,
    [258](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:258)     )

File ~/git/argilla/argilla/src/argilla/responses.py:170, in UserResponse.from_model(cls, model, dataset)
    [167](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:167)     if isinstance(question, RankingQuestion):
    [168](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:168)         answer.value = cls.__ranking_from_model_value(answer.value)  # type: ignore
--> [170](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:170) return cls(answers=answers)

File ~/git/argilla/argilla/src/argilla/responses.py:129, in UserResponse.__init__(self, answers, client, _record)
    [123](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:123) super().__init__(client=client)
    [125](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:125) self._record = _record
    [126](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:126) self._model = UserResponseModel(
    [127](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:127)     values=self.__responses_as_model_values(answers),
    [128](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:128)     status=self._compute_status_from_answers(answers),
--> [129](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:129)     user_id=self._compute_user_id_from_answers(answers),
    [130](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:130) )

File ~/git/argilla/argilla/src/argilla/responses.py:200, in UserResponse._compute_user_id_from_answers(self, answers)
    [198](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:198) if len(user_ids) > 1:
    [199](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:199)     raise ValueError("Multiple user_ids found in user answers.")
--> [200](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:200) return next(iter(user_ids))

StopIteration:

Expected behavior I would expect argilla to return user ids.

Environment:

  • Argilla Version [e.g. 1.0.0]:
  • ElasticSearch Version [e.g. 7.10.2]:
  • Docker Image (optional) [e.g. argilla:v1.0.0]:

Additional context Add any other context about the problem here.

burtenshaw avatar Jun 18 '24 11:06 burtenshaw