coralnet
coralnet copied to clipboard
Sources can get duplicate image names
Example error:
MultipleObjectsReturned at /source/<id>/upload/metadata_preview_ajax/
get() returned more than one Metadata -- it returned 2!
We saw this on October 4, when Bernardo from NOAA reported an error (via Google Group) during metadata upload. The problem was that there were one or more non-unique image names involved in the metadata upload - so when that image name appeared in a metadata CSV row, CoralNet didn't know which image to assign the metadata to.
CoralNet is supposed to prevent duplicate image names from getting saved - through upload, metadata edit, etc. - so this indicates some kind of bug. We don't yet know how the bug occurs, or if it's a consistently repeatable bug versus a weird race condition.
If the bug ends up occurring often, with no obvious fix in sight, we might consider writing a script to detect such dupe-name cases more proactively.
Happened again today.
I thought of something to check in the code, and it turned out to be right. It's definitely a bug. The upload preview request checks for dupe image names, but the upload request itself doesn't check.
So you can have two browser tabs on the same source's upload page, select the same image names for both (getting past the preview step), then start upload for both. Both uploads will be successful.
We should:
- [ ] Check for dupe names in the upload request itself, not just the preview.
- [ ] See if anything else is needed to be totally safe against race conditions. One way could be moving the
name
field from theMetadata
model to theImage
model, and then defining a database-level joint uniqueness constraint on the source and name fields of theImage
model. Not sure if there is an easier way. - [ ] Clean up any other dupe image names that are lurking about. Only a handful of CoralNet URLs actually initiate a 'get metadata for this image name' query, so some dupes may not have caused errors yet. A while back (0ab0d36b357da81e6ab7c88eb58bad8a49595040), we used a data migration to add suffixes of
__dupe-name-1
,__dupe-name-2
, etc. to detected dupes' names. So we can do something similar this time.
I guess I'll add this to the pile of bugs to fix after Beta 2 rollout.
(2020/08/11 edit: happened again today, but seems that the names were fixed by the time I checked)
(2022/07/29 edit: happened again)
Seen again today. Here's some simple dupe-checking code (for a single source) which I wrote a while back, and still works:
from images.models import Image, Metadata, Source
s = Source.objects.get(pk=<SOURCE ID>)
ms = Metadata.objects.filter(image__source=s)
prev_name = None
for m in ms.order_by('name'):
if prev_name == m.name:
print(m.name)
prev_name = m.name