Boundary iou api
Motivation
This PR proposes to add support for boundary iou as described and implemented here:
https://github.com/bowenc0221/boundary-iou-api/tree/master
Basic usage is as follows:
from faster_coco_eval import COCO
from faster_coco_eval import COCOeval_faster as COCOeval
from faster_coco_eval.core.boundary_utils import add_boundary_multi_core
cocoGt = COCO(annoation_file)
cocoDt = cocoGt.loadRes(results)
# add boundaries (still slow)
add_boundary_multi_core(cocoGt)
add_boundary_multi_core(cocoDt)
# run evaluation (fast now)
cocoEval = COCOeval(cocoGt, cocoDt, "boundary", print_function=print, extra_calc=True)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
print(cocoEval.stats_as_dict)
The speedup for the evaluation of "boundary" is comparable to the speedup of "segm". The method add_boundary_multi_core however is still very slow. It suffers from IPC overhead so I experimented with shared_memory but ultimately decided to keep it simple for now. In summary, the implementation is very close to the original boundary-iou-api, benefits from the existing speed-up in the evaluation and does not introduce any BC.
Dependency OpenCV
The method mask_to_boundary requires OpenCV. Since the original repository removed the dependency from its setup.py here, I did not add it as well, for now. This should be addressed though, either by adding opencv-python/opencv-python-headless to the dependency list or by informing the user that OpenCV is required to run boundary iou.
Checklist
- Flake8 was used
- Unit tests have been added
- The documentation should be updated with the example above (Wiki, which I can not edit, no?)
Hi, friend! If you continue to work on your PR, check out the new version of the library. I added LVIS support, some useful C++ features, data preprocessing acceleration, etc.
You can see the full list of changes in history.md
https://github.com/MiXaiLL76/faster_coco_eval/commit/3ea78b47102477763f76843b4e097a9dee91e419
I probably solved the problem with this PR. Look at the code. I implemented some functions in C++ and they are faster than usual. It is impossible to connect multithreading now, but I think I will figure it out in the future.
Hey @MiXaiLL76, I just tested #3ea78 with a pre-trained model on ms-coco and can confirm that it works. Please note however that I'm NOT an expert in C++ and therefore, in no position to review or comment on those code changes. What I can do is test from a user perspective.
Timings
On my machine I get the following timings for boundary-iou calling 'mask to boundary' for gt and dt, 'evaluate' and 'accumulate'. Note that in faster_coco_eval, the conversion happens during the evaluate call. As can be seen, mask_api is competitive. If you can figure out multiprocessing it should become super fast.
| method | total | conversion |
|---|---|---|
| original boundary_iou (cpu_num=1) | 233s | 208s |
| original boundary_iou (cpu_num=16) | 78s | 50s |
| faster_coco_eval ('mask_api') | 84s | 84s |
| faster_coco_eval ('opencv') | 284s | 284s |
Numerical difference
The opencv backend produces equivalent results to the original boundary_iou api. The mask_api however will results in a small difference. I suspect a small difference in the 'toBoundary' call.
| metric | abs diff betwen 'opencv' and 'mask_api' |
|---|---|
| AP | 0.000801060954045002 |
| AP50 | 0.0013774061439439933 |
| AP75 | 0.0014265724507938338 |
| APs | 9.674013682015037e-06 |
| APm | 0.0011956629580721634 |
| APl | 0.0010393406044141573 |
| AR1 | 0.0005525991919907436 |
| AR10 | 0.0007796195093274783 |
| AR100 | 0.000799188643608062 |
| ARs | 1.478404244537046e-05 |
| ARm | 0.001231027036155019 |
| ARl | 0.0011080064218199626 |
| AP55 | 0.0009091696333984323 |
| AP60 | 0.0008645775720608762 |
| AP65 | 0.0006132017268402068 |
| AP70 | 0.0011151904680729574 |
| AP80 | 0.0010361626330520901 |
| AP85 | 0.00048417112348642793 |
| AP90 | 0.0001841577888008275 |
| AP95 | 0.0 |
When I tinkered with the kornia and numpy versions, I used something like this to test the 'to_boundary' call against the opencv implementation:
def cross():
# generate a fake mask for testing
mask = torch.zeros((1,800,800), dtype=torch.uint8)
mask[:, 100:700, 350:400] = 255
mask[:, 350:400, 100:700] = 255
for i in range(5):
mask = torch.max(mask, TF.rotate(mask, 30))
return mask
As far as I can tell, verson 1.6.0 will make this PR obsolete so feel free to close as you please. Also let me know if you need aditional testing.
Can you share your test dataset? GT+DT? So that I can add it to the tests and possibly correct the differences in values
Yes sure. GT is ms-coco instances_val2017.json. And here is the results file I used for testing: https://drive.google.com/file/d/14JsfBm5Dd9COr-7--LXKd48OEtehNmP3
It already includes the computed boundaries so I do the following before testing:
with open(results_file, 'r') as f:
results = json.load(f)
for r in results:
r.pop('boundary')
Personally, I found it easier to compare the 'boundaries' returned by the different apis for a single test mask as shown above. If you achieve equivalence there, the metrics should follow.
Thank you, I will study the data and correct the scripts if possible
@JohannesTheo look at this!
https://github.com/MiXaiLL76/faster_coco_eval/commit/30a9891a30526154d9d61dfcd9369301e6f7962b
I made the calculations asynchronous, quite a complex code, but it works and quite quickly!
True, with a full repetition of the opencv function does not work, but the percentage of error between functions is 0.001 in my tests
Hey @MiXaiLL76, unfortunately I can't get 30a9891 to work. From what I can tell is that CPU usage peaks for a very short time when calculateRleForAllAnnotations but then gets stuck somehow (I stopped after several minutes). Any suggestion on how to debug or test this further?
I used ms-coco instances_val2017.json and the results file linked above. I tested with boundary_cpu_count=1, 4, 16 and two machines (Intel(R) Core(TM) i7-6950X and AMD EPYC 9354 32-Core).
Hey @MiXaiLL76, unfortunately I can't get 30a9891 to work. From what I can tell is that CPU usage peaks for a very short time when
calculateRleForAllAnnotationsbut then gets stuck somehow (I stopped after several minutes). Any suggestion on how to debug or test this further?I used ms-coco
instances_val2017.jsonand the results file linked above. I tested withboundary_cpu_count=1, 4, 16and two machines (Intel(R) Core(TM) i7-6950X and AMD EPYC 9354 32-Core).
Can you show a code example of how you tried to run this? Everything worked for me during validation
Yes sure.
Env:
conda create --name fce_1.6 python=3.11
conda activate fce_1.6
git clone https://github.com/MiXaiLL76/faster_coco_eval.git
cd faster_coco_eval/
git checkout mask_cpp
pip install ./
Code:
import json
import faster_coco_eval
from faster_coco_eval import COCO, COCOeval_faster
assert faster_coco_eval.__version__ == '1.6.0'
annotation_file = "./instances_val2017.json"
results_file = "./ms-coco_segm.json"
with open(results_file, 'r') as f:
results = json.load(f)
for r in results:
r.pop('boundary')
cocoGt = COCO(annotation_file)
cocoDt = cocoGt.loadRes(results)
cocoEval = COCOeval_faster(cocoGt, cocoDt, 'boundary', print_function=print, extra_calc=False, boundary_cpu_count=4)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
Maybe it's some system limits like open files when using the async C++? Just a wild guess. Hope the code example helps to narrow things down. Let me know if I can do anything else. I'm on debian btw. I'll check compiler version tomorrow and will edit here.
Maybe it's some system limits like open files when using the async C++? Just a wild guess. Hope the code example helps to narrow things down. Let me know if I can do anything else. I'm on debian btw. I'll check compiler version tomorrow and will edit here.
I found a bug in the RLE::merge function. (https://github.com/MiXaiLL76/faster_coco_eval/commit/2eaec46808e3e14fbc8dd0504fcc6584ba3c0d1f)
I rewrote it to be C++17 compliant. (I just copied it before without touching it (heh))
I was able to run the same code as you, please try it too
@profile(stdout=False, filename='faster_coco_eval.prof')
def faster_coco_eval():
prepared_coco_in_dict = "/home/mixaill76/faster_coco_eval/examples/ultralytics/datasets/coco/annotations/instances_val2017.json"
prepared_anns = "/home/mixaill76/faster_coco_eval/examples/ms-coco_segm.json"
cocoGt = COCO(prepared_coco_in_dict)
results = COCO.load_json(prepared_anns)
for r in results:
r.pop('boundary')
cocoDt = cocoGt.loadRes(results)
cocoEval = COCOeval_faster(cocoGt, cocoDt, "boundary", print_function=print, boundary_cpu_count=12)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
faster_coco_eval()
Hey @MiXaiLL76, I can confirm it works now 🥳 I don't know what wizardry you pulled off but dude, this thing's fast 🔥
In addition to ms-coco annotations I also tested on sama-coco and coco-rem (two relabled versions of instances_val2017.json). I used the DTs linked above. Note that these times are for the complete evaluation process including the loading of GTs and DTs. For boundary I ran with 16 'cores' in both, boundary_iou and faster_coco_eval.
| GT version | pycocotools | boundary_iou | faster_coco_eval |
|---|---|---|---|
| ms-coco | ------------- | -------------- | ------------------ |
| bbox | 0:00:16 | 0:00:16 | 0:00:03 |
| segm | 0:00:20 | 0:00:20 | 0:00:08 |
| boundary | x | 0:01:11 | 0:00:28 |
| sama-coco | ------------- | -------------- | ------------------ |
| bbox | 0:00:22 | 0:00:23 | 0:00:05 |
| segm | 0:00:27 | 0:00:27 | 0:00:10 |
| boundary | x | 0:01:18 | 0:00:29 |
| coco-rem | ------------- | -------------- | ------------------ |
| bbox | 0:00:19 | 0:00:20 | 0:00:05 |
| segm | 0:00:23 | 0:00:23 | 0:00:09 |
| boundary | x | 0:01:17 | 0:00:27 |
Regarding numerical differences, I only print if abs(m1-m2) > 0 for the respective pairing. As can be seen below, bbox and segm have no, or only some neglegtable difference between the libs. In case of boundary, difference is very small. Since using the 'opencv' backend showed no difference in my previous test, this can be narrowed down to the new boundary creation. Since the diffs are consistent but small, I suspect it to be some small bug. Could be an off-by-one thing in the erosion or dilation or something like that. When I coded the kornia and numpy version I made this mistake and had similar differences to the opencv implementation. If you are motivated, it might be worth looking into that to achieve 100% parity.
Either way, great effort!
| GT version | metric | pyct vs. biou | pyct vs. fast | biou vs. fast |
|---|---|---|---|---|
| ms-coco bbox | ||||
| AP95 | 0.0 | 6.938893903907228e-18 | 6.938893903907228e-18 | |
| ms-coco segm | ||||
| ms-coco boundary | ||||
| AP | - | - | 4.5666008668954206e-08 | |
| AP50 | - | - | 4.566600867450532e-07 | |
| APl | - | - | 4.606795220296611e-08 | |
| AR10 | - | - | 1.7985611510673571e-06 | |
| AR100 | - | - | 1.7985611511228683e-06 | |
| ARl | - | - | 3.063725490193292e-06 | |
| sama-coco bbox | ||||
| sama-coco segm | ||||
| sama-coco boundary | ||||
| AP | - | - | 8.046750677559444e-06 | |
| AP75 | - | - | 8.046750677528913e-05 | |
| APs | - | - | 2.7755575615628914e-17 | |
| APm | - | - | 5.730490150457346e-06 | |
| APl | - | - | 8.583186332788983e-06 | |
| AR1 | - | - | 5.040322580640577e-06 | |
| AR10 | - | - | 5.040322580640577e-06 | |
| AR100 | - | - | 9.441731031401002e-06 | |
| ARm | - | - | 1.5432098765422175e-05 | |
| ARl | - | - | 9.328358208993137e-06 | |
| AP85 | - | - | 1.3877787807814457e-17 | |
| coco-rem bbox | ||||
| coco-rem segm | ||||
| coco-rem boundary | ||||
| AP | - | - | 2.607371973883943e-07 | |
| APm | - | - | 1.7694598591910804e-07 | |
| AR1 | - | - | 1.8011527377348457e-06 | |
| AR10 | - | - | 1.8011527377348457e-06 | |
| AR100 | - | - | 1.801152737790357e-06 | |
| ARm | - | - | 4.56204379561953e-06 | |
| AP85 | - | - | 1.3877787807814457e-17 | |
| AP95 | - | - | 2.6073719743707497e-06 |
Thanks for the tests and the metrics! In fact, the error is still there, I can show you an example of what it looks like
As you can see, opencv leaves a line in such with some objects. And my library removes it. This is an example of the erode operation.
I will try to debug it to get 100% match, but for now I consider the error of 5e-6 insignificant
I hope these are the last fixes in this function))) https://github.com/MiXaiLL76/faster_coco_eval/commit/71b245df2ccd5b8af9332cf5ec438a2ab9174f85
Please, friend, do your tests again, compare the AP results between the libraries. I ran opencv vs maskapi and got 100% match, I think I did it
The error with the borders looked something like this
Hey @MiXaiLL76, I just tested the last fix an can confirm that it removes the numerical differences :) Now, pycocotools, boundary-iou and faster-coco-eval will produce identical* results with faster-coco-eval being, well, much faster :D I tested with the results file I provided and the results from a different checkpoint, just to verify.
Thanks for the effort and congrats on this achievement! This is really cool and quite useful 🥳
*as shown above, some metrics differ around 10^-17 which I would consider neglectable. Probably some rounding error or numerical resolution issue but not worth putting more work into.
Yes, errors in 10^-17 are usually an error in translating floating point numbers from C++ to python.
It's not critical everywhere, I think it's a win, thanks for participating in the development of the framework!
Now I'll work on the visualization functions =)
Thanks for implementing this! Really cool to have a fast boundary-iou in faster_coco_eval. Looking forward to the 1.6 relase :)
I'll close the PR now