sam-hq Can I have a more specific using example (code snippet) in README.md?

Can I have a more specific using example (code snippet) in README.md?

Open stevezkw1998 opened this issue 1 year ago • 5 comments

I want to get start with the step: https://github.com/SysCV/sam-hq?tab=readme-ov-file#getting-started

from segment_anything import SamPredictor, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)

But the ambiguous parameters like "<model_type>" "<path/to/checkpoint>" , ... are not specific enough for me to understand how to use this tool,

For example, the parameter <your_image> should stand for absolute path? the image object? or multi-binary array?

I would appreciate it if you could provide a specific use example for HQ-SAM inference if possible thanks!

Jan 31 '24 08:01 stevezkw1998

hi, we provided the demo file here for you to refer. python demo/demo_hqsam.py.

You can also refer to the colab notebook here.

Jan 31 '24 08:01 lkeab

Collaborator

Got it, thank you for your help, I would go through it, From now on, in my humble view, maybe only sam_checkpoint is necessary enough to load the model, because it can tell the model_type from sam_checkpoint and less input would let your tool even easier to use :)

Jan 31 '24 09:01 stevezkw1998

I noticed there is one line:

device = "cuda"
sam.to(device=device)

I am not sure what device I should pass, if I run this as Docker Image in Kubeflow pipeline ? (Kubeflow will pre-allocated one GPU for it.) I would be very grateful if someone could provide some suggestions, thanks.

Jan 31 '24 09:01 stevezkw1998

and in the line

input_box = np.array([[4,13,1007,1023]])
input_point, input_label = None, None

the bbox is in [w,h,x,y] format or [x1,y1,x2,y2] format?

Jan 31 '24 09:01 stevezkw1998

And I have another question:

image = cv2.imread('demo/input_imgs/example0.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_box = np.array([[4,13,1007,1023]])
input_point, input_label = None, None
predictor.set_image(image)
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box = input_box,
    multimask_output=False,
    hq_token_only= False,
)

If I input a bbox which just cover the whole image, and let multimask_output=True will I get multi-object results after inferencing?

Jan 31 '24 09:01 stevezkw1998

sam-hq sam-hq copied to clipboard

Can I have a more specific using example (code snippet) in README.md?

sam-hq
sam-hq copied to clipboard