YOLO-World icon indicating copy to clipboard operation
YOLO-World copied to clipboard

How to make predictions using image bytes instead of image paths

Open XieKaiwen opened this issue 1 year ago • 6 comments

For now I am doing inference on images following the code in the inference.ipynb file. However, I have realised that it uses image paths to be able to make the inference.

However, due to limitations in the API service i have to integrate this with (which sends over image bytes), it would be much more optimal for me to do inference directly on image bytes. Is there a way to do this?

XieKaiwen avatar May 25 '24 06:05 XieKaiwen

There are several ways: (1) if you're using YOLO-World for open vocabulary detection, you can see: https://github.com/AILab-CVC/YOLO-World/blob/master/demo/simple_demo.py#L48, which adopts numpy.array as inputs. You can decode the bitstream to numpy.array.

(2) if you're using YOLO-World for close-set detection, you can also use ONNX models, which might be much efficient.

wondervictor avatar May 25 '24 08:05 wondervictor

@wondervictor

File "/home/jupyter/til-24-base/vlm/YOLO-World/VLMManager.py", line 25, in init self.model = init_detector(cfg, checkpoint=checkpoint, device='cuda:0') File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmdet/apis/inference.py", line 102, in init_detector metainfo = DATASETS.build(test_dataset_cfg).metainfo File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/home/jupyter/til-24-base/vlm/YOLO-World/yolo_world/datasets/mm_dataset.py", line 25, in init self.dataset = DATASETS.build(dataset) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmyolo/datasets/yolov5_coco.py", line 19, in init super().init(*args, **kwargs) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmdet/datasets/base_det_dataset.py", line 51, in init super().init(*args, **kwargs) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmengine/dataset/base_dataset.py", line 247, in init self.full_init() File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmyolo/datasets/yolov5_coco.py", line 27, in full_init self.data_list = self.load_data_list() File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmdet/datasets/lvis.py", line 600, in load_data_list raise ImportError( ImportError: Package lvis is not installed. Please run "pip install git+https://github.com/lvis-dataset/lvis-api.git".

This happened when i tried to use the code in simple_demo.py. Please advice on how to change such that this error will be solved (best if do not need to install lvis because of less dependencies). My code is below

import numpy as np
import torch
from mmengine.config import Config
from mmengine.dataset import Compose
from mmengine.runner import Runner
from mmengine.runner.amp import autocast
from mmyolo.registry import RUNNERS
from torchvision.ops import nms
import PIL.Image
import cv2
import supervision as sv
import glob
import json


if __name__ == "__main__":
    # load config
    cfg = Config.fromfile(
        "../configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py"
    )
    cfg.work_dir = "."
    # cfg.load_from = "../training_epochs/total_30_epochs/epoch_20.pth"
    cfg.load_from = "../pretrained_weights/yolo_world_v2_l_obj365v1_goldg_pretrain_1280ft-9babe3f6.pth"
    runner = Runner.from_cfg(cfg)
    runner.call_hook("before_run")
    runner.load_or_resume()
    pipeline = cfg.test_dataloader.dataset.pipeline
    runner.pipeline = Compose(pipeline)

    # run model evaluation
    runner.model.eval()
    
def colorstr(*input):
    """
        Helper function for style logging
    """
    *args, string = input if len(input) > 1 else ("bold", input[0])
    colors = {"bold": "\033[1m"}

    return "".join(colors[x] for x in args) + f"{string}"


bounding_box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)
mask_annotator = sv.MaskAnnotator()

# # class_names = ("grey camouflage fighter jet . grey and white fighter plane . white and black drone . white and black fighter jet . white missile . black and white commercial aircraft")
# class_names = ("grey missile . red, white, and blue light aircraft . green and black missile . white and red helicopter .")
# class_names.strip()

with open("../../augmented_yolo_train_dataset/annotations/yolo_world_train.json", 'r') as f:
    dataset = json.load(f)



def run_image(
        runner,
        input_image,
        class_names,
        output_image,
        max_num_boxes=100,
        score_thr=0.05,
        nms_thr=0.5,
):
    print(f"Analysing {input_image}...\n class_name: {class_names}")
    print("=================================================================================================================")
    
    
    output_image = "./runs/detect/"+output_image
    class_names.strip()
    texts = [[t.strip()] for t in class_names.split(".")] + [[" "]]
    print(texts)
    data_info = runner.pipeline(dict(img_id=0, img_path=input_image,
                                     texts=texts))

    data_batch = dict(
        inputs=data_info["inputs"].unsqueeze(0),
        data_samples=[data_info["data_samples"]],
    )

    with autocast(enabled=False), torch.no_grad():
        output = runner.model.test_step(data_batch)[0]
        runner.model.class_names = texts
        pred_instances = output.pred_instances

    # nms
    keep_idxs = nms(pred_instances.bboxes, pred_instances.scores, iou_threshold=nms_thr)
    pred_instances = pred_instances[keep_idxs]
    pred_instances = pred_instances[pred_instances.scores.float() > score_thr]

    if len(pred_instances.scores) > max_num_boxes:
        indices = pred_instances.scores.float().topk(max_num_boxes)[1]
        pred_instances = pred_instances[indices]
    output.pred_instances = pred_instances

    # predictions
    pred_instances = pred_instances.cpu().numpy()
    print(pred_instances)

    if 'masks' in pred_instances:
        masks = pred_instances['masks']
    else:
        masks = None
        
    detections = sv.Detections(
        xyxy=pred_instances['bboxes'],
        class_id=pred_instances['labels'],
        confidence=pred_instances['scores']
    )

    # label ids with confidence scores
    labels = [
        f"{class_id} {confidence:0.2f}"
        for class_id, confidence
        in zip(detections.class_id, detections.confidence)
    ]

    # draw bounding box with label
    image = PIL.Image.open(input_image)
    svimage = np.array(image)
    svimage = bounding_box_annotator.annotate(svimage, detections)
    svimage = label_annotator.annotate(svimage, detections, labels)
    if masks is not None:
        svimage = mask_annotator.annotate(image, detections)

    # save output image
    cv2.imwrite(output_image, svimage[:, :, ::-1])
    print(f"Results saved to {colorstr('bold', output_image)}")

    #return svimage[:, :, ::-1]

# print(image_test_inference_list)
for i in range(21, 63):
    image_path = f"../../unaugmented_images_for_testing/image_{i}.jpeg"
    class_names = dataset["images"][i]["caption"]
    file_name = image_path[37:]
    run_image(runner, image_path, class_names, "pred_"+file_name)
    

XieKaiwen avatar May 26 '24 13:05 XieKaiwen

@wondervictor please advise on how to edit the config files/deal with this error.

The config i used for the Config.from() is taken from the file im this github and it is uneditted

XieKaiwen avatar May 28 '24 00:05 XieKaiwen

@wondervictor hi sorry for troubling you so often, but could I get some advice on this? Must I just pip install from the github link?

XieKaiwen avatar May 28 '24 12:05 XieKaiwen

@wondervictor I have resolved the error by pip installing the github link. However I was met with an error " File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/lvis/lvis.py", line 35, in _load_json with open(path, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/lvis/lvis_v1_minival_inserted_image_name.json'"

I tried to solve this by creating a json file of the name under the directories (json was empty). But then I was once again faced with another error

File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmyolo/datasets/yolov5_coco.py", line 27, in full_init self.data_list = self.load_data_list() File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/mmdet/datasets/lvis.py", line 605, in load_data_list self.lvis = LVIS(local_path) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/lvis/lvis.py", line 27, in __init__ self.dataset = self._load_json(annotation_path) File "/home/jupyter/til-24-base/vlm/YOLOvenv/lib/python3.10/site-packages/lvis/lvis.py", line 36, in _load_json return json.load(f) File "/opt/conda/lib/python3.10/json/__init__.py", line 293, in load return loads(fp.read(), File "/opt/conda/lib/python3.10/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Does this mean that my json file must have data in order for it to work? Is there a way of bypassing all of these errors while still being able to do inference on image bytes? Thank you very much for any help given! If I have to comment out any of the code in the default config code in configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py or have to make any edits to it, please advise on it because I am not fully sure on how to edit the config files for my own purpose

XieKaiwen avatar May 28 '24 17:05 XieKaiwen

in demo/simple_demo.py I commented out this line image = image[:, :, [2, 1, 0]] in inference() and it started giving me better results.

gigasurgeon avatar Aug 14 '24 14:08 gigasurgeon

init

The reason might be the download. Please refer to https://hf-mirror.com/GLIPModel/GLIP/tree/main and download the file. Remenber to check the content is in a json format. (I used wget to download from hf-mirror, but the file I downloaded was in a xml format)

VoyagerXvoyagerx avatar Oct 30 '24 12:10 VoyagerXvoyagerx