supervision
supervision copied to clipboard
Show Progress in time consuming tasks
Search before asking
- [X] I have searched the Supervision issues and found no similar feature requests.
Description
It is highly suggestable that users have information on how process is going on especially on time consuming tasks such as dataset operations.
Example: Loading a large dataset can be time consuming. It can take a while to load whole dataset. Meanwhile users might be wondering if it is working properly or not. In worst situaiton, the process might even fails due to circumstances like memory issue or dataset issue.
Use case
- Dataset ops
- Image saving ops
- Video processing ops
Additional
Solution:
Introducing Progressbar from tqdm can be very useful in such scenario.
Are you willing to submit a PR?
- [X] Yes I'd like to help by submitting a PR!
@SkalskiP This can be wait for later.
Hi @hardikdava 👋🏻 I was also thinking about it. There are two things we need to consider:
- The progress bar needs to work in the terminal and notebook environment.
- The progress bar needs to be optional. You should be able to turn that off.
Let's keep that issue open. We will pick it up in the future 100%.
@SkalskiP
Tqdm could be a solution for this
https://github.com/tqdm/tqdm (works with terminal/notebooks) you can see in tqdm docs.
@onuralpszr oh I'm sure we will use tqdm to implement it. Awesome package.
@hardikdava Let me know if you've already started working on this! Would be fun to pick this up if not. Can help out wherever necessary if you have
Hi, @mayankagarwals 👋🏻! As far as I know, no one started to work on this issue yet. Feel free to try :)
@mayankagarwals Yeah, I haven't started working on this. Feel free to try it out :)
Amazing. I think this problem will be solved differently for each of
1. Dataset ops
2. Image saving ops
3. Video processing ops
based on the process taking the most time in each and how homogeneously they can be divided to track progress.
For Dataset ops, I ran a small test on the function to load coco dataset
ds = sv.DetectionDataset.from_coco(
images_directory_path="/Users/mayankagarwal/Documents/Personal/CS/codebase/Supervision-Parent/coco-segmentation/train",
annotations_path="/Users/mayankagarwal/Documents/Personal/CS/codebase/Supervision-Parent/coco-segmentation/train/_annotations.coco.json",
force_masks=True
)
This function call takes 3.394498109817505 seconds in totality for the dataset https://universe.roboflow.com/athanasios-kokkinhs-iqro0/aedespupaeheads
The majority of the time 3.370098114013672 seconds is spent iterating over each image and converting annotations to supervision-specific class Detections. More specifically this piece of code
for coco_image in coco_images:
image_name, image_width, image_height = (
coco_image["file_name"],
coco_image["width"],
coco_image["height"],
)
image_annotations = coco_annotations_groups.get(coco_image["id"], [])
image_path = os.path.join(images_directory_path, image_name)
image = cv2.imread(str(image_path))
annotation = coco_annotations_to_detections(
image_annotations=image_annotations,
resolution_wh=(image_width, image_height),
with_masks=force_masks,
)
annotation = map_detections_class_id(
source_to_target_mapping=class_index_mapping,
detections=annotation,
)
images[image_name] = image
annotations[image_name] = annotation
I believe this will be true of other dataset operations too. If @SkalskiP @hardikdava both of you agree, I believe for dataset loading operations we can show the progress of how many images have been iterated and processed. Please let me know, will proceed once we are aligned (considering I'm still fairly new to the ecosystem)
I will start by implementing this for just the coco dataset loading. As mentioned by @SkalskiP , will take care of these points too
The progress bar needs to work in the terminal and notebook environment.
The progress bar needs to be optional. You should be able to turn that off.
@SkalskiP @hardikdava Let me know what you folks think about this. Shouldn't be a big deal to raise a PR if this method seems appropriate.
@mayankagarwals yeah, first understand how the dataset API works and designed. Then start potential time consuming tasks such as loading or saving dataset. Please let us know when you implemented first feature. Then i think it would be very similar to other places. Ideally implement progressbar once and reuse it whenever possible.
@hardikdava @SkalskiP
I've gone through the dataset API.
Here's a version 0 demo:
https://colab.research.google.com/drive/1m2juLJAJYU-IJLIaQpPEWKU4AtgN7m6B?usp=sharing
It doesn't require many changes thanks to the awesome tqdm team. How it looks like:
Let me know if I'm missing something
@hardikdava I think we need to make supervision logging optional.
Yeah @SkalskiP 100% agree with you.
@hardikdava we probably should use an environment variable to turn it on and off?
Yeah, we can add env variable SV_LOGGING can be True or False. I suggest to make it False as default. Since users will probably have their own Loggers.
Yep, we can do that. But is this V0 what was envisioned? I'm yet to add the optional logging part and documentation, opened a draft PR.
Yeah, we can add env variable
SV_LOGGINGcan beTrue or False. I suggest to make itFalseas default. Since users will probably have their ownLoggers.
@hardikdava I would like to have logging levels.
Yep, we can do that. But is this V0 what was envisioned? I'm yet to add the optional logging part and documentation, opened a draft PR.
Visually it is okey from my side. :)
@hardikdava I would like to have logging levels.
Probably best to delegate the nuance to a third party logger like https://github.com/Delgan/loguru
Visually it is okey from my side. :)
Cool! Making logging optional we probably require a decision on how supervision wants to proceed with logging as whole
@hardikdava I would like to have logging levels.
Probably best to delegate the nuance to a third party logger like https://github.com/Delgan/loguru
Visually it is okey from my side. :)
Cool! Making logging optional we probably require a decision on how supervision wants to proceed with logging as whole
I was actually talking with @SkalskiP @hardikdava other day and I was already working on POC and show about bring loguru.
I was actually talking with @SkalskiP @hardikdava other day and I was already working on POC and show about bring loguru.
Amazing! The question is whether we can release this before making logging optional. Will probably harm any dataset loading benchmarks.
We can probably introduce loguru and then merge this based on loguru env variable of appropriate log level
@mayankagarwals yeah. I prefer to introduce logging before moving forward with that task. Let's pause the work here, for now, to give @onuralpszr time to produce POC. @mayankagarwals, you can focus on Pascal VOC for now.
I prefer to introduce logging before moving forward with that task
Agreed @SkalskiP
@mayankagarwals, you can focus on Pascal VOC for now.
That's a very small change, just blocked at something. Have tagged you. Will take up the instance segmentation confusion matrix change
Will take up the instance segmentation confusion matrix change
Awesome!
@onuralpszr Heya, Please update this thread once you have a POC for loguru!
@onuralpszr Heya, Please update this thread once you have a POC for loguru!
I will have meeting in monday then I will update based on what I wish to do and sure I will update here !! :)
Is there still work left on this issue ? I am interested in playing a part