argilla
argilla copied to clipboard
[BUG] [SDK] [v2] StopIteration error when retrieving records with existing responses.
Describe the bug
When updating suggestions in records via DatasetRecords.log
a StopIteration
error is thrown trying to iterate over the non-existent user_ids.
This happens on the argilla dev environment deployment, and not on a local server
Stacktrace and Code to create the bug
# add suggestions
dataset = load_dataset(
"HuggingFaceFW/fineweb-edu", name="sample-10BT", split="train", streaming=True
)
data = list(dataset.take(100))
ds = dev.datasets("fineweb-edu")
suggestions = [
{"id": record["id"], "int_score": record["int_score"]}
for record in data
]
ds.records.log(records=suggestions, batch_size=10)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[7], [line 1](vscode-notebook-cell:?execution_count=7&line=1)
----> [1](vscode-notebook-cell:?execution_count=7&line=1) ds.records.log(records=suggestions, batch_size=10)
File ~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235, in DatasetRecords.log(self, records, mapping, user_id, batch_size)
[233](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:233) batch_records = record_models[batch : batch + batch_size]
[234](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:234) models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
--> [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235) created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
[236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:236) records_updated += updated
[238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:238) records_created = len(created_or_updated) - records_updated
File ~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235, in <listcomp>(.0)
[233](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:233) batch_records = record_models[batch : batch + batch_size]
[234](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:234) models, updated = self._api.bulk_upsert(dataset_id=self.__dataset.id, records=batch_records)
--> [235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:235) created_or_updated.extend([Record.from_model(model=model, dataset=self.__dataset) for model in models])
[236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:236) records_updated += updated
[238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_dataset_records.py:238) records_created = len(created_or_updated) - records_updated
File ~/git/argilla/argilla/src/argilla/records/_resource.py:250, in Record.from_model(cls, model, dataset)
[235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:235) @classmethod
[236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:236) def from_model(cls, model: RecordModel, dataset: "Dataset") -> "Record":
[237](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:237) """Converts a RecordModel object to a Record object.
[238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:238) Args:
[239](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:239) model: A RecordModel object.
(...)
[242](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:242) A Record object.
[243](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:243) """
[244](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:244) return cls(
[245](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:245) id=model.external_id,
[246](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:246) fields=model.fields,
[247](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:247) metadata={meta.name: meta.value for meta in model.metadata},
[248](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:248) vectors=[Vector.from_model(model=vector) for vector in model.vectors],
[249](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:249) # Responses and their models are not aligned 1-1.
--> [250](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:250) responses=[
[251](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:251) response
[252](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:252) for response_model in model.responses
[253](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:253) for response in UserResponse.from_model(response_model, dataset=dataset)
[254](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:254) ],
[255](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:255) suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],
[256](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:256) _dataset=dataset,
[257](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:257) _server_id=model.id,
[258](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:258) )
File ~/git/argilla/argilla/src/argilla/records/_resource.py:253, in <listcomp>(.0)
[235](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:235) @classmethod
[236](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:236) def from_model(cls, model: RecordModel, dataset: "Dataset") -> "Record":
[237](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:237) """Converts a RecordModel object to a Record object.
[238](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:238) Args:
[239](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:239) model: A RecordModel object.
(...)
[242](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:242) A Record object.
[243](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:243) """
[244](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:244) return cls(
[245](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:245) id=model.external_id,
[246](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:246) fields=model.fields,
[247](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:247) metadata={meta.name: meta.value for meta in model.metadata},
[248](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:248) vectors=[Vector.from_model(model=vector) for vector in model.vectors],
[249](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:249) # Responses and their models are not aligned 1-1.
[250](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:250) responses=[
[251](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:251) response
[252](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:252) for response_model in model.responses
--> [253](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:253) for response in UserResponse.from_model(response_model, dataset=dataset)
[254](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:254) ],
[255](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:255) suggestions=[Suggestion.from_model(model=suggestion, dataset=dataset) for suggestion in model.suggestions],
[256](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:256) _dataset=dataset,
[257](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:257) _server_id=model.id,
[258](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/records/_resource.py:258) )
File ~/git/argilla/argilla/src/argilla/responses.py:170, in UserResponse.from_model(cls, model, dataset)
[167](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:167) if isinstance(question, RankingQuestion):
[168](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:168) answer.value = cls.__ranking_from_model_value(answer.value) # type: ignore
--> [170](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:170) return cls(answers=answers)
File ~/git/argilla/argilla/src/argilla/responses.py:129, in UserResponse.__init__(self, answers, client, _record)
[123](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:123) super().__init__(client=client)
[125](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:125) self._record = _record
[126](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:126) self._model = UserResponseModel(
[127](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:127) values=self.__responses_as_model_values(answers),
[128](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:128) status=self._compute_status_from_answers(answers),
--> [129](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:129) user_id=self._compute_user_id_from_answers(answers),
[130](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:130) )
File ~/git/argilla/argilla/src/argilla/responses.py:200, in UserResponse._compute_user_id_from_answers(self, answers)
[198](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:198) if len(user_ids) > 1:
[199](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:199) raise ValueError("Multiple user_ids found in user answers.")
--> [200](https://file+.vscode-resource.vscode-cdn.net/Users/ben/Downloads/~/git/argilla/argilla/src/argilla/responses.py:200) return next(iter(user_ids))
StopIteration:
Expected behavior I would expect argilla to return user ids.
Environment:
- Argilla Version [e.g. 1.0.0]:
- ElasticSearch Version [e.g. 7.10.2]:
- Docker Image (optional) [e.g. argilla:v1.0.0]:
Additional context Add any other context about the problem here.