computervision-recipes
computervision-recipes copied to clipboard
[BUG] Redesign utils_cv.common.data.unzip_url() for unusual scenario
Description
Usually, the directory name is the same as the zipball without the .zip
extension. For example, all files in odFridgeObjectsTiny.zip
will be in the odFridgeObjectsTiny
directory after being unzipped. But sometimes, it is not the case. After unzip annotations_trainval2017.zip
, the files in annotations_trainval2017.zip
are put under the annotations
directory, not annotations_trainval2017
.
In which platform does it happen?
All platforms
How do we replicate the issue?
>>> from utils_cv.common.data import unzip_url
>>> url = "http://images.cocodataset.org/annotations/annotations_trainval2017.zip"
>>> path = unzip_url(url, exist_ok=True)
>>> path
'/home/simon/repo/fork-cvbp/data/annotations_trainval2017'
>>> Path(path).is_dir()
False
>>> from pathlib import Path
>>> from pprint import pprint
>>> pprint([i for i in Path(path).parent.iterdir() if i.is_dir()])
[PosixPath('/home/simon/repo/fork-cvbp/data/annotations'),
PosixPath('/home/simon/repo/fork-cvbp/data/PennFudanPed'),
PosixPath('/home/simon/repo/fork-cvbp/data/odFridgeObjects'),
PosixPath('/home/simon/repo/fork-cvbp/data/odFridgeObjectsTiny')]
>>> from zipfile import ZipFile
>>> a = ZipFile(Path(path).parent / url.split('/')[-1])
>>> a.printdir()
File Name Modified Size
annotations/instances_train2017.json 2017-09-01 19:02:24 469785474
annotations/instances_val2017.json 2017-09-01 19:02:32 19987840
annotations/captions_train2017.json 2017-09-01 19:04:56 91865115
annotations/captions_val2017.json 2017-09-01 19:04:58 3872473
annotations/person_keypoints_train2017.json 2017-09-01 19:04:32 238884731
annotations/person_keypoints_val2017.json 2017-09-01 19:04:38 10020657
$ ls /home/simon/repo/fork-cvbp/data/
annotations odFridgeObjectsTiny.zip
annotations_trainval2017.zip odFridgeObjects.zip
odFridgeObjects PennFudanPed
odFridgeObjectsTiny PennFudanPed.zip
$ ls /home/simon/repo/fork-cvbp/data/annotations
captions_train2017.json instances_val2017.json
captions_val2017.json person_keypoints_train2017.json
instances_train2017.json person_keypoints_val2017.json
Expected behavior (i.e. solution)
It's better to redesign the utils_cv.common.data.unzip_url()
function to deal with this scenario but not change the existing code that use utils_cv.common.data.unzip_url()
in order not to degrade the user experience.