efficientvit icon indicating copy to clipboard operation
efficientvit copied to clipboard

Hi @xuanlinli17, can you give an explicit example for multiple bboxes format?

Open MenSanYan opened this issue 11 months ago • 4 comments

          Hi  @xuanlinli17,

Thank you for bringing up the issue. We truly appreciate your efforts in helping us solve it!

After carefully reviewing your pull request, I noticed that there might be a misunderstanding regarding the input format of points. Specifically, for ONNX/TensorRT inference, the default image batch size is 1. As for args.point, it should be in the shape [B, N, 3], where B represents the number of output masks (not the image batch size), N denotes the number of prompt points you provide for each mask (pad the same point if it is smaller than N), and 3 refers to the coordinates and the label in format (x, y, label).

I have made the necessary updates to the code to address the issue. Your pull request is really helpful for my modifications. You can try it out whether it works for you now. Should you have any further questions, please don't hesitate to reach out. Once again, thank you for your valuable contribution!

Best, Zhuoyang

Originally posted by @zhuoyang20 in https://github.com/mit-han-lab/efficientvit/issues/77#issuecomment-1987133496

MenSanYan avatar Mar 14 '24 07:03 MenSanYan

Hi MenSanYan,

To perform TensorRT inference on multiple boxes, you can run the following command:

python deployment/sam/tensorrt/inference.py --model xl1 --encoder_engine assets/export_models/sam/tensorrt/xl1_encoder.engine --decoder_engine assets/export_models/sam/tensorrt/xl1_decoder.engine --img_path assets/fig/my_example.jpg --mode boxes --boxes "[[x1,y1,x2,y2],[x3,y3,x4,y4]]"

Best, Zhuoyang

zhuoyang20 avatar Mar 15 '24 11:03 zhuoyang20

Hi @zhuoyang20, i am curious about how to change the decoder input from points to boxes in deployment/sam/onnx/export_decoder.py, i find that in the export demo, the input set is point not box, thanks for your reply~~

onnx_model = DecoderOnnxModel(
    model=efficientvit_sam,
    return_single_mask=return_single_mask,
)

dynamic_axes = {
    "point_coords": {0: "batch_size", 1: "num_points"},
    "point_labels": {0: "batch_size", 1: "num_points"},
}

embed_dim = efficientvit_sam.prompt_encoder.embed_dim
embed_size = efficientvit_sam.prompt_encoder.image_embedding_size
dummy_inputs = {
    "image_embeddings": torch.randn(1, embed_dim, *embed_size, dtype=torch.float),
    "point_coords": torch.randint(low=0, high=1024, size=(16, 2, 2), dtype=torch.float),
    "point_labels": torch.randint(low=0, high=4, size=(16, 2), dtype=torch.float),
}

_ = onnx_model(**dummy_inputs)

output_names = ["masks", "iou_predictions"]

if not os.path.exists(os.path.dirname(output)):
    os.makedirs(os.path.dirname(output))

with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
    warnings.filterwarnings("ignore", category=UserWarning)
    with open(output, "wb") as f:
        print(f"Exporting onnx model to {output}...")
        torch.onnx.export(
            onnx_model,
            tuple(dummy_inputs.values()),
            f,
            export_params=True,
            verbose=False,
            opset_version=opset,
            do_constant_folding=True,
            input_names=list(dummy_inputs.keys()),
            output_names=output_names,
            dynamic_axes=dynamic_axes,
        )

onnx_path = str(output)
onnx_model = onnx.load(onnx_path)
model_sim, check = simplify(onnx_model)
onnx.save(model_sim, output)
print("success to simplify")

yangchengxin avatar Apr 01 '24 05:04 yangchengxin

Hi @zhuoyang20, i am curious about how to change the decoder input from points to boxes in deployment/sam/onnx/export_decoder.py, i find that in the export demo, the input set is point not box, thanks for your reply~~

onnx_model = DecoderOnnxModel(
    model=efficientvit_sam,
    return_single_mask=return_single_mask,
)

dynamic_axes = {
    "point_coords": {0: "batch_size", 1: "num_points"},
    "point_labels": {0: "batch_size", 1: "num_points"},
}

embed_dim = efficientvit_sam.prompt_encoder.embed_dim
embed_size = efficientvit_sam.prompt_encoder.image_embedding_size
dummy_inputs = {
    "image_embeddings": torch.randn(1, embed_dim, *embed_size, dtype=torch.float),
    "point_coords": torch.randint(low=0, high=1024, size=(16, 2, 2), dtype=torch.float),
    "point_labels": torch.randint(low=0, high=4, size=(16, 2), dtype=torch.float),
}

_ = onnx_model(**dummy_inputs)

output_names = ["masks", "iou_predictions"]

if not os.path.exists(os.path.dirname(output)):
    os.makedirs(os.path.dirname(output))

with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
    warnings.filterwarnings("ignore", category=UserWarning)
    with open(output, "wb") as f:
        print(f"Exporting onnx model to {output}...")
        torch.onnx.export(
            onnx_model,
            tuple(dummy_inputs.values()),
            f,
            export_params=True,
            verbose=False,
            opset_version=opset,
            do_constant_folding=True,
            input_names=list(dummy_inputs.keys()),
            output_names=output_names,
            dynamic_axes=dynamic_axes,
        )

onnx_path = str(output)
onnx_model = onnx.load(onnx_path)
model_sim, check = simplify(onnx_model)
onnx.save(model_sim, output)
print("success to simplify")

i have fixed this problem, the points infer and boxes infer can share the same onnx to infer, i only need to set the corresponding label to determine whether the input is point or box(i find that '1' represent point, and '2','3' represent the boxes' axis ?)

yangchengxin avatar Apr 03 '24 01:04 yangchengxin

@yangchengxin hi, have you solved this problem and can you share the code?

DAVID-Hown avatar Jul 25 '24 10:07 DAVID-Hown