cvat
cvat copied to clipboard
Attach data to a task: better MIME type detection
Actions before raising this issue
- [X] I searched the existing issues and did not find anything similar.
- [X] I read/searched the docs
Is your feature request related to a problem? Please describe.
Context
I am uploading image files via https://app.cvat.ai/api/docs/#tag/tasks/operation/tasks_create_data (using the client_files
parameters).
In my case, my image files are stored on disk in a content-addressable manner mimicking how git store and name files. E.g. typically, a JPEG file could be stored as /var/misc/images/1f/ec4f5cee029f96c1e9eddd09821a51c0a9f80a
.
Problem
The problem is related to the CVAT engine MIME type detection which is based on file extensions:
- https://github.com/cvat-ai/cvat/blob/f93d58c1ca9401daeee5beba5d5f79ace975c02b/cvat/apps/engine/task.py#L215-L231
- https://github.com/cvat-ai/cvat/blob/f93d58c1ca9401daeee5beba5d5f79ace975c02b/cvat/apps/engine/media_extractors.py#L859-L863
E.g. is_image
builds upon https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type:
def _is_image(path):
mime = mimetypes.guess_type(path)
# Exclude vector graphic images because Pillow cannot work with them
return mime[0] is not None and mime[0].startswith('image') and \
not mime[0].startswith('image/svg')
tl;dr
In my case, all the uploaded image files get ignored.
Describe the solution you'd like
I think it would be great if MIME type detection could be expanded to support magic detection (file headers), e.g. using https://github.com/ahupp/python-magic or anything equivalent. In other words, do not get limited to file extension based detection (.jpg
, etc).
NB.: I am talking about images, but same could be done for other media types of course.
Describe alternatives you've considered
I am forced to rename (add an extension) at upload time (work around).
Additional context
No response