deeplake
deeplake copied to clipboard
[FEATURE] Tool to upload all image files to a hub store
🚨🚨 Feature Request
- [ ] Related to an existing Issue
- [x] A new implementation (Improvement, Extension)
If your feature will improve HUB
Create a tool that will recursively walk a given local directory and upload any image files found to the given hub-format store.
Description of the possible solution
Suggested API:
hub-upload-images --source-path <dir> # directory with image files
--labels-file <labels-file> # text file with a mapping of (base) filename to label
--hub-store-path <path> # path to the hub-store
--sample-compression <compression>
[--extensions <extensions>] # optional. Default: jpg,jpeg,JPG,JPEG
[--verify|--no-verify] # set verify=True or False in hub.read(). Default=True
The labels-file could be a comma-separated file with entries like:
filename1,0
filename2,1
filename3,0
Example invocations:
# Upload all jpeg files with extension jpeg or JPEG to a local directory
$ hub-upload-images --source-path ./data/images --labels-file ./data/labels.txt --hub-store-path ./data/hub-store --extensions jpg,jpeg,JPG,JPEG
# Upload local-disk to s3
$ hub-upload-images --source-path ./data/images --labels-file ./data/labels.txt --hub-store-path s3://hub-datasets/myimages-set1 --extensions jpg,JPEG,png,PNG --sample-compression jpeg
# Upload local-disk to hub
$ hub-upload-images --source-path ./data/images --labels-file ./data/labels.txt --hub-store-path hub://mydatasets/myimage-set1 --sample-compression jpeg
# Upload s3 to s3
$ hub-upload-images --source-path s3://mydata/images-for-ml --labels-file ./data/labels.txt --hub-store-path hub://mydatasets/myimage-set1 --sample-compression jpeg
Notes: If there is only one file extension, then the sample_compression can automatically by that.If there is only one file extension, then the sample_compression can automatically by that.
Difficulty: Hard
if its alright, I'll definitely love working on this