tanzhenyu
tanzhenyu
Transformer models are becoming more and more relevant. Let's support it by having ViT first. citing https://arxiv.org/pdf/2010.11929.pdf
Related to https://github.com/keras-team/keras-cv/issues/668, this is another transformer based model which we'd like to be supported. For simplicity, focus on only classification task, having in mind that future detection tasks might...
Other than RetinaNet, the library should support a heavier model that allows trade-off between performance / speed. We will start with core components first, and decide what kind of meta...
Completing the story of `Model.fit` for FasterRCNN. This is achieved by overriding `train_step` and `test_step`. Testing it with Pascal VOC dataset, the performance / quality is similar to CTL. Currently...
Today, in order to augment_bbox, users are required to pass `[y_min, x_min, y_max, x_max, class_id]` to the layer input, i.e., concat `gt_boxes` and `gt_classes`. This is a little anti user...
Running `python examples/models/object_detection/retina_net/basic/pascal_voc/train.py` fails at: ``` Traceback (most recent call last): File "/home/overflow/code/keras-cv/examples/models/object_detection/retina_net/basic/pascal_voc/train.py", line 98, in visualize_dataset(dataset, bounding_box_format="xywh") File "/home/overflow/code/keras-cv/examples/models/object_detection/retina_net/basic/pascal_voc/train.py", line 87, in visualize_dataset boxes = keras_cv.bounding_box.convert_format( File "/home/overflow/code/keras-cv/keras_cv/bounding_box/converters.py", line...
During the development of FasterRCNN, I have noticed reasonable loss and reasonable prediction outcome, but always get low mAP metrics (~0.18). The evaluation loop takes ~13 minutes. So I instead...
Here's our implementation: https://github.com/keras-team/keras-cv/blame/master/keras_cv/losses/smooth_l1.py#L39-L42 Here's a definition used by PyTorch: https://pytorch.org/docs/stable/generated/torch.nn.SmoothL1Loss.html A beta denominator is missing. It still works fine if the cutoff is 1, but anything else would be...
i.e., from: x, y, w, h to: (x-x_a) / w_a (y-y_a) / h_a log(w / w_a) log(h / h_a) potentially allow box variance as well