supervision icon indicating copy to clipboard operation
supervision copied to clipboard

Show Progress in time consuming tasks

Open hardikdava opened this issue 2 years ago • 27 comments
trafficstars

Search before asking

  • [X] I have searched the Supervision issues and found no similar feature requests.

Description

It is highly suggestable that users have information on how process is going on especially on time consuming tasks such as dataset operations.

Example: Loading a large dataset can be time consuming. It can take a while to load whole dataset. Meanwhile users might be wondering if it is working properly or not. In worst situaiton, the process might even fails due to circumstances like memory issue or dataset issue.

Use case

  • Dataset ops
  • Image saving ops
  • Video processing ops

Additional

Solution:

Introducing Progressbar from tqdm can be very useful in such scenario.

Are you willing to submit a PR?

  • [X] Yes I'd like to help by submitting a PR!

hardikdava avatar Jul 06 '23 17:07 hardikdava

@SkalskiP This can be wait for later.

hardikdava avatar Jul 06 '23 17:07 hardikdava

Hi @hardikdava 👋🏻 I was also thinking about it. There are two things we need to consider:

  • The progress bar needs to work in the terminal and notebook environment.
  • The progress bar needs to be optional. You should be able to turn that off.

Let's keep that issue open. We will pick it up in the future 100%.

SkalskiP avatar Jul 10 '23 14:07 SkalskiP

@SkalskiP

Tqdm could be a solution for this

https://github.com/tqdm/tqdm (works with terminal/notebooks) you can see in tqdm docs.

onuralpszr avatar Jul 18 '23 12:07 onuralpszr

@onuralpszr oh I'm sure we will use tqdm to implement it. Awesome package.

SkalskiP avatar Jul 18 '23 12:07 SkalskiP

@hardikdava Let me know if you've already started working on this! Would be fun to pick this up if not. Can help out wherever necessary if you have

mayankagarwals avatar Jul 23 '23 09:07 mayankagarwals

Hi, @mayankagarwals 👋🏻! As far as I know, no one started to work on this issue yet. Feel free to try :)

SkalskiP avatar Jul 23 '23 11:07 SkalskiP

@mayankagarwals Yeah, I haven't started working on this. Feel free to try it out :)

hardikdava avatar Jul 23 '23 11:07 hardikdava

Amazing. I think this problem will be solved differently for each of

1. Dataset ops
2. Image saving ops
3. Video processing ops

based on the process taking the most time in each and how homogeneously they can be divided to track progress.

For Dataset ops, I ran a small test on the function to load coco dataset

ds = sv.DetectionDataset.from_coco(
    images_directory_path="/Users/mayankagarwal/Documents/Personal/CS/codebase/Supervision-Parent/coco-segmentation/train",
    annotations_path="/Users/mayankagarwal/Documents/Personal/CS/codebase/Supervision-Parent/coco-segmentation/train/_annotations.coco.json",
    force_masks=True
)

This function call takes 3.394498109817505 seconds in totality for the dataset https://universe.roboflow.com/athanasios-kokkinhs-iqro0/aedespupaeheads The majority of the time 3.370098114013672 seconds is spent iterating over each image and converting annotations to supervision-specific class Detections. More specifically this piece of code

    for coco_image in coco_images:
        image_name, image_width, image_height = (
            coco_image["file_name"],
            coco_image["width"],
            coco_image["height"],
        )
        image_annotations = coco_annotations_groups.get(coco_image["id"], [])
        image_path = os.path.join(images_directory_path, image_name)

        image = cv2.imread(str(image_path))
        annotation = coco_annotations_to_detections(
            image_annotations=image_annotations,
            resolution_wh=(image_width, image_height),
            with_masks=force_masks,
        )
        annotation = map_detections_class_id(
            source_to_target_mapping=class_index_mapping,
            detections=annotation,
        )

        images[image_name] = image
        annotations[image_name] = annotation

I believe this will be true of other dataset operations too. If @SkalskiP @hardikdava both of you agree, I believe for dataset loading operations we can show the progress of how many images have been iterated and processed. Please let me know, will proceed once we are aligned (considering I'm still fairly new to the ecosystem)

I will start by implementing this for just the coco dataset loading. As mentioned by @SkalskiP , will take care of these points too

The progress bar needs to work in the terminal and notebook environment.
The progress bar needs to be optional. You should be able to turn that off.

mayankagarwals avatar Jul 23 '23 16:07 mayankagarwals

@SkalskiP @hardikdava Let me know what you folks think about this. Shouldn't be a big deal to raise a PR if this method seems appropriate.

mayankagarwals avatar Jul 24 '23 11:07 mayankagarwals

@mayankagarwals yeah, first understand how the dataset API works and designed. Then start potential time consuming tasks such as loading or saving dataset. Please let us know when you implemented first feature. Then i think it would be very similar to other places. Ideally implement progressbar once and reuse it whenever possible.

hardikdava avatar Jul 24 '23 11:07 hardikdava

@hardikdava @SkalskiP

I've gone through the dataset API.
Here's a version 0 demo: https://colab.research.google.com/drive/1m2juLJAJYU-IJLIaQpPEWKU4AtgN7m6B?usp=sharing

It doesn't require many changes thanks to the awesome tqdm team. How it looks like:

image

Let me know if I'm missing something

mayankagarwals avatar Jul 24 '23 11:07 mayankagarwals

@hardikdava I think we need to make supervision logging optional.

SkalskiP avatar Jul 24 '23 12:07 SkalskiP

Yeah @SkalskiP 100% agree with you.

hardikdava avatar Jul 24 '23 12:07 hardikdava

@hardikdava we probably should use an environment variable to turn it on and off?

SkalskiP avatar Jul 24 '23 12:07 SkalskiP

Yeah, we can add env variable SV_LOGGING can be True or False. I suggest to make it False as default. Since users will probably have their own Loggers.

hardikdava avatar Jul 24 '23 12:07 hardikdava

Yep, we can do that. But is this V0 what was envisioned? I'm yet to add the optional logging part and documentation, opened a draft PR.

mayankagarwals avatar Jul 24 '23 13:07 mayankagarwals

Yeah, we can add env variable SV_LOGGING can be True or False. I suggest to make it False as default. Since users will probably have their own Loggers.

@hardikdava I would like to have logging levels.

Yep, we can do that. But is this V0 what was envisioned? I'm yet to add the optional logging part and documentation, opened a draft PR.

Visually it is okey from my side. :)

SkalskiP avatar Jul 24 '23 21:07 SkalskiP

@hardikdava I would like to have logging levels.

Probably best to delegate the nuance to a third party logger like https://github.com/Delgan/loguru

Visually it is okey from my side. :)

Cool! Making logging optional we probably require a decision on how supervision wants to proceed with logging as whole

mayankagarwals avatar Jul 25 '23 04:07 mayankagarwals

@hardikdava I would like to have logging levels.

Probably best to delegate the nuance to a third party logger like https://github.com/Delgan/loguru

Visually it is okey from my side. :)

Cool! Making logging optional we probably require a decision on how supervision wants to proceed with logging as whole

I was actually talking with @SkalskiP @hardikdava other day and I was already working on POC and show about bring loguru.

onuralpszr avatar Jul 25 '23 05:07 onuralpszr

I was actually talking with @SkalskiP @hardikdava other day and I was already working on POC and show about bring loguru.

Amazing! The question is whether we can release this before making logging optional. Will probably harm any dataset loading benchmarks.

We can probably introduce loguru and then merge this based on loguru env variable of appropriate log level

mayankagarwals avatar Jul 25 '23 05:07 mayankagarwals

@mayankagarwals yeah. I prefer to introduce logging before moving forward with that task. Let's pause the work here, for now, to give @onuralpszr time to produce POC. @mayankagarwals, you can focus on Pascal VOC for now.

SkalskiP avatar Jul 25 '23 09:07 SkalskiP

I prefer to introduce logging before moving forward with that task

Agreed @SkalskiP

@mayankagarwals, you can focus on Pascal VOC for now.

That's a very small change, just blocked at something. Have tagged you. Will take up the instance segmentation confusion matrix change

mayankagarwals avatar Jul 25 '23 10:07 mayankagarwals

Will take up the instance segmentation confusion matrix change

Awesome!

SkalskiP avatar Jul 25 '23 10:07 SkalskiP

@onuralpszr Heya, Please update this thread once you have a POC for loguru!

mayankagarwals avatar Aug 19 '23 18:08 mayankagarwals

@onuralpszr Heya, Please update this thread once you have a POC for loguru!

I will have meeting in monday then I will update based on what I wish to do and sure I will update here !! :)

onuralpszr avatar Aug 19 '23 19:08 onuralpszr

Is there still work left on this issue ? I am interested in playing a part

KartikeyBartwal avatar Sep 07 '23 15:09 KartikeyBartwal