faster_coco_eval Boundary iou api

Motivation

This PR proposes to add support for boundary iou as described and implemented here:

https://github.com/bowenc0221/boundary-iou-api/tree/master

Basic usage is as follows:

from faster_coco_eval import COCO
from faster_coco_eval import COCOeval_faster as COCOeval
from faster_coco_eval.core.boundary_utils import add_boundary_multi_core

cocoGt = COCO(annoation_file)
cocoDt = cocoGt.loadRes(results)

# add boundaries (still slow)
add_boundary_multi_core(cocoGt)
add_boundary_multi_core(cocoDt)

# run evaluation (fast now)
cocoEval = COCOeval(cocoGt, cocoDt, "boundary", print_function=print, extra_calc=True)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
print(cocoEval.stats_as_dict)

The speedup for the evaluation of "boundary" is comparable to the speedup of "segm". The method add_boundary_multi_core however is still very slow. It suffers from IPC overhead so I experimented with shared_memory but ultimately decided to keep it simple for now. In summary, the implementation is very close to the original boundary-iou-api, benefits from the existing speed-up in the evaluation and does not introduce any BC.

Dependency OpenCV

The method mask_to_boundary requires OpenCV. Since the original repository removed the dependency from its setup.py here, I did not add it as well, for now. This should be addressed though, either by adding opencv-python/opencv-python-headless to the dependency list or by informing the user that OpenCV is required to run boundary iou.

Checklist

Flake8 was used
Unit tests have been added
The documentation should be updated with the example above (Wiki, which I can not edit, no?)

Jun 17 '24 16:06 JohannesTheo

Hi, friend! If you continue to work on your PR, check out the new version of the library. I added LVIS support, some useful C++ features, data preprocessing acceleration, etc.

You can see the full list of changes in history.md

Jun 19 '24 18:06 MiXaiLL76

https://github.com/MiXaiLL76/faster_coco_eval/commit/3ea78b47102477763f76843b4e097a9dee91e419

I probably solved the problem with this PR. Look at the code. I implemented some functions in C++ and they are faster than usual. It is impossible to connect multithreading now, but I think I will figure it out in the future.

Aug 19 '24 19:08 MiXaiLL76

Hey @MiXaiLL76, I just tested #3ea78 with a pre-trained model on ms-coco and can confirm that it works. Please note however that I'm NOT an expert in C++ and therefore, in no position to review or comment on those code changes. What I can do is test from a user perspective.

Timings

On my machine I get the following timings for boundary-iou calling 'mask to boundary' for gt and dt, 'evaluate' and 'accumulate'. Note that in faster_coco_eval, the conversion happens during the evaluate call. As can be seen, mask_api is competitive. If you can figure out multiprocessing it should become super fast.

method	total	conversion
original boundary_iou (cpu_num=1)	233s	208s
original boundary_iou (cpu_num=16)	78s	50s
faster_coco_eval ('mask_api')	84s	84s
faster_coco_eval ('opencv')	284s	284s

Numerical difference

The opencv backend produces equivalent results to the original boundary_iou api. The mask_api however will results in a small difference. I suspect a small difference in the 'toBoundary' call.

metric	abs diff betwen 'opencv' and 'mask_api'
AP	0.000801060954045002
AP50	0.0013774061439439933
AP75	0.0014265724507938338
APs	9.674013682015037e-06
APm	0.0011956629580721634
APl	0.0010393406044141573
AR1	0.0005525991919907436
AR10	0.0007796195093274783
AR100	0.000799188643608062
ARs	1.478404244537046e-05
ARm	0.001231027036155019
ARl	0.0011080064218199626
AP55	0.0009091696333984323
AP60	0.0008645775720608762
AP65	0.0006132017268402068
AP70	0.0011151904680729574
AP80	0.0010361626330520901
AP85	0.00048417112348642793
AP90	0.0001841577888008275
AP95	0.0

When I tinkered with the kornia and numpy versions, I used something like this to test the 'to_boundary' call against the opencv implementation:

def cross():
    # generate a fake mask for testing
    mask = torch.zeros((1,800,800), dtype=torch.uint8)
    mask[:, 100:700, 350:400] = 255
    mask[:, 350:400, 100:700] = 255
    for i in range(5):
        mask = torch.max(mask, TF.rotate(mask, 30))
    return mask

As far as I can tell, verson 1.6.0 will make this PR obsolete so feel free to close as you please. Also let me know if you need aditional testing.

Aug 20 '24 15:08 JohannesTheo

Can you share your test dataset? GT+DT? So that I can add it to the tests and possibly correct the differences in values

Aug 20 '24 15:08 MiXaiLL76

Yes sure. GT is ms-coco instances_val2017.json. And here is the results file I used for testing: https://drive.google.com/file/d/14JsfBm5Dd9COr-7--LXKd48OEtehNmP3

It already includes the computed boundaries so I do the following before testing:

with open(results_file, 'r') as f:
    results = json.load(f)
        
for r in results:
    r.pop('boundary')

Personally, I found it easier to compare the 'boundaries' returned by the different apis for a single test mask as shown above. If you achieve equivalence there, the metrics should follow.

Aug 20 '24 15:08 JohannesTheo

Thank you, I will study the data and correct the scripts if possible

Aug 20 '24 17:08 MiXaiLL76

@JohannesTheo look at this!

https://github.com/MiXaiLL76/faster_coco_eval/commit/30a9891a30526154d9d61dfcd9369301e6f7962b

I made the calculations asynchronous, quite a complex code, but it works and quite quickly!

True, with a full repetition of the opencv function does not work, but the percentage of error between functions is 0.001 in my tests

Aug 22 '24 15:08 MiXaiLL76

Hey @MiXaiLL76, unfortunately I can't get 30a9891 to work. From what I can tell is that CPU usage peaks for a very short time when calculateRleForAllAnnotations but then gets stuck somehow (I stopped after several minutes). Any suggestion on how to debug or test this further?

I used ms-coco instances_val2017.json and the results file linked above. I tested with boundary_cpu_count=1, 4, 16 and two machines (Intel(R) Core(TM) i7-6950X and AMD EPYC 9354 32-Core).

Aug 23 '24 14:08 JohannesTheo

Hey @MiXaiLL76, unfortunately I can't get 30a9891 to work. From what I can tell is that CPU usage peaks for a very short time when calculateRleForAllAnnotations but then gets stuck somehow (I stopped after several minutes). Any suggestion on how to debug or test this further?

I used ms-coco instances_val2017.json and the results file linked above. I tested with boundary_cpu_count=1, 4, 16 and two machines (Intel(R) Core(TM) i7-6950X and AMD EPYC 9354 32-Core).

Can you show a code example of how you tried to run this? Everything worked for me during validation

Aug 23 '24 17:08 MiXaiLL76

Yes sure.

Env:

conda create --name fce_1.6 python=3.11
conda activate fce_1.6

git clone https://github.com/MiXaiLL76/faster_coco_eval.git
cd faster_coco_eval/
git checkout mask_cpp
pip install ./

Code:

import json
import faster_coco_eval
from faster_coco_eval import COCO, COCOeval_faster

assert faster_coco_eval.__version__ == '1.6.0'

annotation_file = "./instances_val2017.json"
results_file = "./ms-coco_segm.json"

with open(results_file, 'r') as f:
    results = json.load(f)
    for r in results:
        r.pop('boundary')

cocoGt = COCO(annotation_file)
cocoDt = cocoGt.loadRes(results)
cocoEval = COCOeval_faster(cocoGt, cocoDt, 'boundary',  print_function=print, extra_calc=False, boundary_cpu_count=4)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()

Maybe it's some system limits like open files when using the async C++? Just a wild guess. Hope the code example helps to narrow things down. Let me know if I can do anything else. I'm on debian btw. I'll check compiler version tomorrow and will edit here.

Aug 23 '24 18:08 JohannesTheo

Maybe it's some system limits like open files when using the async C++? Just a wild guess. Hope the code example helps to narrow things down. Let me know if I can do anything else. I'm on debian btw. I'll check compiler version tomorrow and will edit here.

I found a bug in the RLE::merge function. (https://github.com/MiXaiLL76/faster_coco_eval/commit/2eaec46808e3e14fbc8dd0504fcc6584ba3c0d1f)

I rewrote it to be C++17 compliant. (I just copied it before without touching it (heh))

I was able to run the same code as you, please try it too

@profile(stdout=False, filename='faster_coco_eval.prof')
def faster_coco_eval():
    prepared_coco_in_dict = "/home/mixaill76/faster_coco_eval/examples/ultralytics/datasets/coco/annotations/instances_val2017.json"
    prepared_anns = "/home/mixaill76/faster_coco_eval/examples/ms-coco_segm.json"

    cocoGt = COCO(prepared_coco_in_dict)

    results = COCO.load_json(prepared_anns)
    for r in results:
        r.pop('boundary')

    cocoDt = cocoGt.loadRes(results)

    cocoEval = COCOeval_faster(cocoGt, cocoDt, "boundary", print_function=print, boundary_cpu_count=12)

    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()

faster_coco_eval()

Aug 23 '24 23:08 MiXaiLL76

Hey @MiXaiLL76, I can confirm it works now 🥳 I don't know what wizardry you pulled off but dude, this thing's fast 🔥

In addition to ms-coco annotations I also tested on sama-coco and coco-rem (two relabled versions of instances_val2017.json). I used the DTs linked above. Note that these times are for the complete evaluation process including the loading of GTs and DTs. For boundary I ran with 16 'cores' in both, boundary_iou and faster_coco_eval.

GT version	pycocotools	boundary_iou	faster_coco_eval
ms-coco	-------------	--------------	------------------
bbox	0:00:16	0:00:16	0:00:03
segm	0:00:20	0:00:20	0:00:08
boundary	x	0:01:11	0:00:28
sama-coco	-------------	--------------	------------------
bbox	0:00:22	0:00:23	0:00:05
segm	0:00:27	0:00:27	0:00:10
boundary	x	0:01:18	0:00:29
coco-rem	-------------	--------------	------------------
bbox	0:00:19	0:00:20	0:00:05
segm	0:00:23	0:00:23	0:00:09
boundary	x	0:01:17	0:00:27

Regarding numerical differences, I only print if abs(m1-m2) > 0 for the respective pairing. As can be seen below, bbox and segm have no, or only some neglegtable difference between the libs. In case of boundary, difference is very small. Since using the 'opencv' backend showed no difference in my previous test, this can be narrowed down to the new boundary creation. Since the diffs are consistent but small, I suspect it to be some small bug. Could be an off-by-one thing in the erosion or dilation or something like that. When I coded the kornia and numpy version I made this mistake and had similar differences to the opencv implementation. If you are motivated, it might be worth looking into that to achieve 100% parity.

Either way, great effort!

GT version	metric	pyct vs. biou	pyct vs. fast	biou vs. fast
ms-coco bbox
	AP95	0.0	6.938893903907228e-18	6.938893903907228e-18
ms-coco segm
ms-coco boundary
	AP	-	-	4.5666008668954206e-08
	AP50	-	-	4.566600867450532e-07
	APl	-	-	4.606795220296611e-08
	AR10	-	-	1.7985611510673571e-06
	AR100	-	-	1.7985611511228683e-06
	ARl	-	-	3.063725490193292e-06
sama-coco bbox
sama-coco segm
sama-coco boundary
	AP	-	-	8.046750677559444e-06
	AP75	-	-	8.046750677528913e-05
	APs	-	-	2.7755575615628914e-17
	APm	-	-	5.730490150457346e-06
	APl	-	-	8.583186332788983e-06
	AR1	-	-	5.040322580640577e-06
	AR10	-	-	5.040322580640577e-06
	AR100	-	-	9.441731031401002e-06
	ARm	-	-	1.5432098765422175e-05
	ARl	-	-	9.328358208993137e-06
	AP85	-	-	1.3877787807814457e-17
coco-rem bbox
coco-rem segm
coco-rem boundary
	AP	-	-	2.607371973883943e-07
	APm	-	-	1.7694598591910804e-07
	AR1	-	-	1.8011527377348457e-06
	AR10	-	-	1.8011527377348457e-06
	AR100	-	-	1.801152737790357e-06
	ARm	-	-	4.56204379561953e-06
	AP85	-	-	1.3877787807814457e-17
	AP95	-	-	2.6073719743707497e-06

Aug 24 '24 11:08 JohannesTheo

Thanks for the tests and the metrics! In fact, the error is still there, I can show you an example of what it looks like As you can see, opencv leaves a line in such with some objects. And my library removes it. This is an example of the erode operation.

I will try to debug it to get 100% match, but for now I consider the error of 5e-6 insignificant

Aug 24 '24 15:08 MiXaiLL76

I hope these are the last fixes in this function))) https://github.com/MiXaiLL76/faster_coco_eval/commit/71b245df2ccd5b8af9332cf5ec438a2ab9174f85

Please, friend, do your tests again, compare the AP results between the libraries. I ran opencv vs maskapi and got 100% match, I think I did it

Aug 24 '24 23:08 MiXaiLL76

The error with the borders looked something like this

Aug 25 '24 08:08 MiXaiLL76

Hey @MiXaiLL76, I just tested the last fix an can confirm that it removes the numerical differences :) Now, pycocotools, boundary-iou and faster-coco-eval will produce identical* results with faster-coco-eval being, well, much faster :D I tested with the results file I provided and the results from a different checkpoint, just to verify.

Thanks for the effort and congrats on this achievement! This is really cool and quite useful 🥳

*as shown above, some metrics differ around 10^-17 which I would consider neglectable. Probably some rounding error or numerical resolution issue but not worth putting more work into.

Aug 25 '24 09:08 JohannesTheo

Yes, errors in 10^-17 are usually an error in translating floating point numbers from C++ to python.

It's not critical everywhere, I think it's a win, thanks for participating in the development of the framework!

Now I'll work on the visualization functions =)

Aug 25 '24 10:08 MiXaiLL76

Thanks for implementing this! Really cool to have a fast boundary-iou in faster_coco_eval. Looking forward to the 1.6 relase :)

I'll close the PR now

Aug 26 '24 08:08 JohannesTheo