fiftyone
fiftyone copied to clipboard
DocumentTooLarge: 'aggregate' command document too large
System information
- OS Platform and Distribution: Linux Ubuntu 20.04
- Python version: Python 3.8.10
- FiftyOne version: FiftyOne v0.21.4, Voxel51, Inc.
- FiftyOne installed from: pip
Describe the problem
I have a dataset with 215840 images, when importing annotations from cvat (fouc.import_annotations) i get following error:
---------------------------------------------------------------------------
DocumentTooLarge Traceback (most recent call last)
[1.cvat_2_fiftyone_dataset.ipynb) Cell 11 line 1
File [~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:266](https://file+.vscode-resource.vscode-cdn.net/media/drive_8tb/Development/deep_learning/projects/ai-model-trainings/data_cleaning/detection-data-cleaning/~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:266), in import_annotations(sample_collection, project_name, project_id, task_ids, data_path, label_types, insert_new, download_media, num_workers, occluded_attr, group_id_attr, backend, **kwargs)
259 for task_id in task_ids:
260 label_schema = api._get_label_schema(
261 task_id=task_id,
262 occluded_attr=occluded_attr,
263 group_id_attr=group_id_attr,
264 )
--> 266 _download_annotations(
267 dataset,
268 [task_id],
269 cvat_id_map,
270 label_schema,
271 label_types,
272 anno_backend,
273 anno_key,
274 **kwargs,
275 )
276 finally:
277 anno_backend.delete_run(dataset, anno_key)
File [~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:395](https://file+.vscode-resource.vscode-cdn.net/media/drive_8tb/Development/deep_learning/projects/ai-model-trainings/data_cleaning/detection-data-cleaning/~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:395), in _download_annotations(dataset, task_ids, cvat_id_map, label_schema, label_types, anno_backend, anno_key, **kwargs)
393 project_ids = []
394 job_ids = []
--> 395 frame_id_map = {
396 task_id: _build_sparse_frame_id_map(dataset, cvat_id_map[task_id])
397 for task_id in task_ids
...
1032 # There's nothing intelligent we can say
1033 # about size for update and delete
-> 1034 raise DocumentTooLarge(f"{operation!r} command document too large")
DocumentTooLarge: 'aggregate' command document too large
code used to import annotations:
fouc.import_annotations(
dataset,
task_ids=[51,52,54,55,56,64,65],
data_path=data_map,
download_media=False,
)
What areas of FiftyOne does this bug affect?
- [ ]
App: FiftyOne application issue - [x]
Core: Core Python library issue - [ ]
Server: FiftyOne server issue
Same issue with a smaller dataset having 1,25,277 images.
Hi @pawani2v, your videos/image may have too much metadata for importation. Importing tasks in smaller batches may solver your issue